All the material presented here, to the extent it is original, is available under CC-BY-SA.
I am running R 4.1.0, with recent update.packages()
.
needed <- c("gstat", "spatstat", "spatstat.linnet", "spatstat.core",
"spatstat.geom", "spatstat.data", "ggplot", "igraph", "spdep",
"spData", "sp", "tmap", "tidycensus", "sf")
Script and data at https://github.com/rsbivand/UAM21_II/raw/main/UAM21_II_210607.zip. Download to suitable location, unzip and use as basis.
Time | Topic |
---|---|
Monday 7/6 | |
09.00-12.00 | How may we integrate spatial data from different sources? How may we aggregate spatial data? Which data structures are helpful in handling spatial data? |
13.00-16.00 | Observations of spatial data are related to each other either by distance, or by a graph of edges linking observations seen as being neighbours. How may they be constructed? How may we address the issues raised by the probable presence of spatial autocorrelation in the spatial data that we are using? |
Tuesday 8/6 | |
09.00-12.00 | How can we measure global spatial autocorrelation? |
13.00-16.00 | How can we measure local spatial autocorrelation? |
Monday 14/6 | |
09.00-12.00 | How may we fit regression models to spatial data in the presence of spatial autocorrelation? Maximum likelihood and spatial filtering, case weights. |
13.00-16.00 | How should we interpret the coefficients or impacts of spatial regression models? How may we predict from spatial regession models? |
Tuesday 15/6 | |
09.00-12.00 | Multi-level models: at which level in the data may we fit spatial processes? |
13.00-16.00 | Spatial filtering, hierarchical GLM, GAM, etc., spatial epidemiological approaches |
Monday 21/6 | |
09.00-12.00 | Presentations/consultations/discussion |
13.00-16.00 | Presentations/consultations/discussion |
In the first part of this two-part course on handling and analysing geo-spatial socio-economic data, we spent some time looking at ways of getting hold of data, and of merging geometries with data on interesting variables. Referring to https://rsbivand.github.io/UAM21_I/UAM21_I_210506.html may be helpful for those needing a refresher. I’ll use a large data set downloaded using the US Census API and the tidycensus package here to replicate Folch et al. (2016), supplemented by numerous other data sets. The article examines the problem of large coefficients of variation (CV) in estimates for Census tracts in the sample-based American Community Survey (ACS):
library(sf)
## Linking to GEOS 3.10.0dev, GDAL 3.3.0, PROJ 8.0.1
library(tidycensus)
options(tigris_use_cache=TRUE)
To run the download script, an API key is required:
census_api_key("MY_KEY")
We make a vector of state FIPS letter codes, omitting Alaska and Hawaii, selecting by index number, not FIPS state codes, which would have meant dropping "02"
and "15"
from "01"
to "56"
:
(us <- unique(fips_codes$state)[c(1, 3:11, 13:51)])
## [1] "AL" "AZ" "AR" "CA" "CO" "CT" "DE" "DC" "FL" "GA" "ID" "IL" "IN" "IA" "KS"
## [16] "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH" "NJ" "NM"
## [31] "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VT" "VA"
## [46] "WA" "WV" "WI" "WY"
For each download step we use lapply()
to apply a function to each element of the state FIPS vector in turn; first we download the tract geometries by state, with the ACS population total. The returned results are in the mp
list, which we join by rows (rbind()
) to create an "sf"
object for 2010 boundaries.
f <- function(x) {
get_acs(geography="tract", variables=c(tot_pop="B01003_001"), year=2010,
state=x, geometry=TRUE)
}
mp <- lapply(us, f)
map10 <- do.call("rbind", mp)
We also download the tract median incomes and their “margins of error” (MOE) in the same way, creating an output "data.frame"
object of 2010 values:
f <- function(x) {
get_acs(geography="tract", variables=c(median_income="B19013_001"), year=2010, state=x)
}
mp <- lapply(us, f)
med_inc_acs10 <- do.call("rbind", mp)
Finally we download 2010 Census results by tract by state and create another "data.frame"
object:
f <- function(x) {
get_decennial(geography="tract", variables=c(tot_pop="P001001", tot_hu="H001001", vacant="H003003", group_pop="P042001", black_tot="P008004", hisp_tot="P004003", m70_74="P012022", m75_79="P012023", m80_84="P012024", m85p="P012025", f70_74="P012046", f75_79="P012047", f80_84="P012048", f85p="P012049"), year=2010, state=x, output="wide")
}
mp <- lapply(us, f)
cen10 <- do.call("rbind", mp)
We have now downloaded the data, and can merge the "sf"
object with the first "data.frame"
object, keying by "GEOID"
, the tract FIPS code. We subset the columns and rename those retained:
df <- merge(map10, med_inc_acs10, by="GEOID")
df1 <- df[,-c(2, 3, 6, 7)]
names(df1) <- c("GEOID", "tot_pop_acs", "tot_pop_moe", "med_inc_acs", "med_inc_moe", "geometry")
names(attr(df1, "agr")) <- names(df1)[-6]
Next we merge that "sf"
object with the Census "data.frame"
object, again keying on "GEOID"
, the tract FIPS code. In the article, only tracts larger than 500 in population and with more than 200 households were retained; in addition, tracts with missing median income MOE values were dropped:
df_tracts_a <- merge(df1, cen10, by="GEOID")
df_tracts <- df_tracts_a[df_tracts_a$tot_pop > 500 & df_tracts_a$tot_hu > 200,]
df_tracts <- df_tracts[!is.na(df_tracts$med_inc_moe),]
Next we convert the MOE values back to the coefficient of variation (CV) for the two ACS variables used:
df_tracts$tot_pop_cv <- (df_tracts$tot_pop_moe/1.645)/df_tracts$tot_pop_acs
df_tracts$med_inc_cv <- (df_tracts$med_inc_moe/1.645)/df_tracts$med_inc_acs
and create rates variables:
df_tracts$old_rate <- sum(as.data.frame(df_tracts)[,13:20])/df_tracts$tot_pop
df_tracts$black_rate <- df_tracts$black_tot/df_tracts$tot_pop
df_tracts$hisp_rate <- df_tracts$hisp_tot/df_tracts$tot_pop
df_tracts$vacancy_rate <- df_tracts$vacant/df_tracts$tot_hu
Concluding, we use s2 to calculate spherical areas , represented as acres, and add a population density variable (ACS inhabitants per acre), before saving the "sf"
object as a GeoPackage file:
library(s2)
df_tracts$area <- NISTunits::NISTsqrMeterTOacre(st_area(df_tracts))
df_tracts$dens <- df_tracts$tot_pop/df_tracts$area
st_write(df_tracts, "df_tracts.gpkg", append=FALSE)
Reading the saved "sf"
object, we can see that we have 71,353 observations, as in the article we are replicating:
df_tracts <- st_read("df_tracts.gpkg")
## Reading layer `df_tracts' from data source
## `/home/rsb/und/uam21/units_2/df_tracts.gpkg' using driver `GPKG'
## Simple feature collection with 71353 features and 28 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.7332 ymin: 24.5447 xmax: -66.9499 ymax: 49.38436
## Geodetic CRS: NAD83
Visualising that many objects is possible, but reducing the area of interest to the Chicago Metropolitan Area, as the source article does, makes plotting quicker:
chicago_MA <- read.table("Chicago_MA.txt", colClasses=c("character", "character"))
chicago_MA_tracts <- !is.na(match(substring(df_tracts$GEOID, 1, 5), chicago_MA$V2))
sum(chicago_MA_tracts)
## [1] 2195
We’ll again use the tmap package with class interval styles from the classInt package:
library(tmap)
tm_shape(df_tracts[chicago_MA_tracts,]) + tm_fill("med_inc_cv", style="fisher", n=7, title="Coefficient of Variation")
In ESRI documentation, CV thresholds of 12 and 40 percent are proposed for the transformed reported MOE values: https://doc.arcgis.com/en/esri-demographics/data/acs.htm. We’ll create a classified variable (ordered factor):
df_tracts$mi_cv_esri <- cut(df_tracts$med_inc_cv, c(0, 0.12, 0.40, Inf), labels=c("High", "Medium", "Low"), right=TRUE, include.lowest=TRUE, ordered_result=TRUE)
table(df_tracts$mi_cv_esri)
##
## High Medium Low
## 48302 22610 441
and map it:
tm_shape(df_tracts[chicago_MA_tracts,]) + tm_fill("mi_cv_esri", title="Reliability")
As the Low reliability tracts are small in size, the "view"
mode for interactive mapping may help:
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(df_tracts[chicago_MA_tracts,]) + tm_fill("mi_cv_esri", title="Reliability")
tmap_mode("plot")
## tmap mode set to plotting
Or equivalently using mapview, plotting the CV values rather than the ordered factor, which is not yet well-supported:
library(mapview)
mapviewOptions(fgb = FALSE)
mapview(df_tracts[chicago_MA_tracts,"med_inc_cv"], layer.name="Coefficient of Variation")
Areal units of observation are very often used when simultaneous observations are aggregated within non-overlapping boundaries. The boundaries may be those of administrative entities, and may be related to underlying spatial processes, such as commuting flows, but are usually arbitrary. If they do not match the underlying and unobserved spatial processes in one or more variables of interest, proximate areal units will contain parts of the underlying processes, engendering spatial autocorrelation. By proximity, we mean closeness in ways that make sense for the data generation processes thought to be involved. In cross-sectional geostatistical analysis with point support, measured distance makes sense for typical data generation processes. In similar analysis of areal data, sharing a border may make more sense, because that is what we do know, but we cannot measure the distance between the areas in as adequate a way.
With support of data we mean the physical size (lenth, area, volume) associated with an individual observational unit (measurement). It is possible to represent the support of areal data by a point, despite the fact that the data have polygonal support. The centroid of the polygon may be taken as a representative point, or the centroid of the largest polygon in a multi-polygon object. When data with intrinsic point support are treated as areal data, the change of support goes the other way, from the known point to a non-overlapping tesselation such as a Voronoi diagram or Dirichlet tessellation or Thiessen polygons often through a Delaunay triangulation using projected coordinates. Here, different metrics may also be chosen, or distances measured on a network rather than on the plane. There is also a literature using weighted Voronoi diagrams in local spatial analysis (see for example Boots and Okabe 2007; Okabe et al. 2008; She et al. 2015).
When the intrinsic support of the data is as points, but the underlying process is between proximate observations rather than driven chiefly by distance however measured between observations, the data may be aggregate counts or totals (polling stations, retail turnover) or represent a directly observed characteristic of the observation (opening hours of the polling station). Obviously, the risk of mis-representing the footprint of the underlying spatial processes remains in all of these cases, not least because the observations are taken as encompassing the entirety of the underlying process in the case of tesselation of the whole area of interest. This is distinct from the geostatistical setting in which observations are rather samples taken using some scheme within the area of interest. It is also partly distinct from the practice of taking areal sample plots within the area of interest but covering only a small proportion of the area, typically used in ecological and environmental research.
In order to explore and analyse areal data of these kinds, methods are needed to represent the proximity of observations. This chapter then considers a subset of the such methods, where the spatial processes are considered as working through proximity understood in the first instance as contiguity, as a graph linking observations taken as neighbours. This graph is typically undirected and unweighted, but may be directed and/or weighted in certain settings, which then leads to further issues with regard to symmetry. In principle, proximity would be expected to operate symmetrically in space, that is that the influence of \(i\) on \(j\) and of \(j\) on \(i\) based on their relative positions should be equivalent. Edge effects are not considered in standard treatments.
Handling spatial autocorrelation using relationships to neighbours on a graph takes the graph as given, chosen by the analyst. This differs from the geostatistical approach in which the analyst chooses the binning of the empirical variogram and function used, and then the way the fitted variogram is fitted. Both involve a priori choices, but represent the underlying correlation in different ways (Wall 2004). In Bavaud (1998) and work citing his contribution, attempts have been made to place graph-based neighbours in a broader context.
One issue arising in the creation of objects representing neighbourhood relationships is that of no-neighbour areal units (R. S. Bivand and Portnov 2004). Islands or units separated by rivers may not be recognised as neighbours when the units have areal support and when using topological relationships such as shared boundaries. In some settings, for example mrf
(Markov Random Field) terms in mgcv::gam()
and similar model fitting functions that require undirected connected graphs, a requirement is violated when there are disconnected subgraphs.
No-neighbour observations can also occur when a distance threshold is used between points, where the threshold is smaller than the maximum nearest neighbour distance. Shared boundary contiguities are not affected by using geographical, unprojected coordinates, but all point-based approaches use distance in one way or another, and need to calculate distances in an appropriate way.
The spdep package provides an nb
class for neighbours, a list of length equal to the number of observations, with integer vector components. No-neighbours are encoded as an integer vector with a single element 0L
, and observations with neighbours as sorted integer vectors containing values in 1L:n
pointing to the neighbouring observations. This is a typical row-oriented sparse representation of neighbours. spdep provides many ways of constructing nb
objects, and the representation and construction functions are widely used in other packages.
spdep builds on the nb
representation (undirected or directed graphs) with the listw
object, a list with three components, an nb
object, a matching list of numerical weights, and a single element character vector containing the single letter name of the way in which the weights were calculated. The most frequently used approach in the social sciences is calculating weights by row standardization, so that all the non-zero weights for one observation will be the inverse of the cardinality of its set of neighbours (1/card(nb[[i]])
).
We will be using election data from the 2015 Polish Presidential election in this chapter, with 2495 municipalities and Warsaw boroughs of the municipality types) , and complete count data from polling stations aggregated to these areal units. The data are an sf sf
object:
data(pol_pres15, package="spDataLarge")
pol_pres15 |>
subset(select=c(TERYT, name, types)) |>
head()
## Simple feature collection with 6 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 235157.1 ymin: 366913.3 xmax: 281431.7 ymax: 413016.6
## Projected CRS: ETRS89 / Poland CS92
## TERYT name types geometry
## 1 020101 BOLESŁAWIEC Urban MULTIPOLYGON (((261089.5 38...
## 2 020102 BOLESŁAWIEC Rural MULTIPOLYGON (((254150 3837...
## 3 020103 GROMADKA Rural MULTIPOLYGON (((275346 3846...
## 4 020104 NOWOGRODZIEC Urban/rural MULTIPOLYGON (((251769.8 37...
## 5 020105 OSIECZNICA Rural MULTIPOLYGON (((263423.9 40...
## 6 020106 WARTA BOLESŁAWIECKA Rural MULTIPOLYGON (((267030.7 38...
library(tmap)
tm_shape(pol_pres15) + tm_fill("types")
Between early 2002 and April 2019, spdep contained functions for constructing and handling neighbour and spatial weights objects, tests for spatial autocorrelation, and model fitting functions. The latter have been split out into spatialreg, and will be discussed in the next chapter. spdep now accommodates objects represented using sf classes and sp classes directly.
library(spdep)
## Loading required package: sp
## Loading required package: spData
The poly2nb()
function in spdep takes the boundary points making up the polygon boundaries in the object passed as the pl=
argument, and for each observation checks whether at least one (queen=TRUE
, default), or at least two (rook, queen=FALSE
) points are within snap=
distance units of each other. The distances are planar in the raw coordinate units, ignoring geographical projections. Once the required number of sufficiently close points is found, the search is stopped.
args(poly2nb)
## function (pl, row.names = NULL, snap = sqrt(.Machine$double.eps),
## queen = TRUE, useC = TRUE, foundInBox = NULL, small_n = 500)
## NULL
Two other arguments are also worth discussing, foundInBox=
and small_n=
. The first, foundInBox=
, accepted the output of the rgeos gUnarySTRtreeQuery()
function to list candidate neighbours, that is polygons whose bounding boxes intersect the bounding boxes of other polygons. From spdep 1.1-7, the GEOS interface of the sf package is used within poly2nb()
if foundInBox=NULL
and the number of observations is greater than small_n=
, to find the candidate neighbours and populate foundInBox
internally. In this case, this use of spatial indexing (STRtree queries) in GEOS through sf is the default, as the number of observations is greater than small_n
:
system.time(pol_pres15 |> poly2nb(queen=TRUE) -> nb_q)
## user system elapsed
## 0.836 0.019 0.858
The print method shows the summary structure of the neighbour object:
nb_q
## Neighbour list object:
## Number of regions: 2495
## Number of nonzero links: 14242
## Percentage nonzero weights: 0.2287862
## Average number of links: 5.708216
Raising small_n=
above the observation count, we see that the processing time is increased a little; the benefits of indexing are more apparent with larger data sets.
system.time(pol_pres15 |> poly2nb(queen=TRUE, small_n=2500) -> nb_q_legacy)
## user system elapsed
## 1.176 0.008 1.188
The output objects are identical:
all.equal(nb_q, nb_q_legacy, check.attributes=FALSE)
## [1] TRUE
Much of the work involved in finding contiguous neighbours is spent on finding candidate neighbours with intersecting bounding boxes. Note that nb
objects record both symmetric neighbour relationships, because these objects admit asymmetric relationships as well, but these duplications are not needed for object construction.
Most of the spdep functions for constructing neighbour objects take a row.names=
argument, the value of which is stored as a region.id
attribute. If not given, the values are taken from row.names()
of the first argument. These can be used to check that the neighbours object is in the same order as data. If nb
objects are subsetted, the indices change to continue to be within 1:length(subsetted_nb)
, but the region.id
attribute values point back to the object from which it was constructed.
We can also check that this undirected graph is connected using the n.comp.nb()
function; while some model estimation techniques do not support graphs that are not connected, it is helpful to be aware of possible problems (Freni-Sterrantino, Ventrucci, and Rue 2018):
(nb_q |> n.comp.nb())$nc
## [1] 1
This approach is equivalent to treating the neighbour object as a graph and using graph analysis on that graph (Csardi and Nepusz 2006), by first coercing to a binary sparse matrix:
library(Matrix)
nb_q |>
nb2listw(style="B") |>
as("CsparseMatrix") -> smat
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
(smat |>
graph.adjacency() -> g1) |>
count_components()
## [1] 1
Neighbour objects may be exported and imported in GAL format for exchange with other software, using write.nb.gal()
and read.gal()
:
tf <- tempfile(fileext=".gal")
write.nb.gal(nb_q, tf)
If areal units are an appropriate representation, but only points have been observed, contiguity relationships may be approximated using graph-based neighbours. In this case, the imputed boundaries tesselate the plane such that points closer to one observation than any other fall within its polygon. The simplest form is by using triangulation, here using the deldir()
function in the deldir package. Because the function returns from and to identifiers, it is easy to construct a long representation of a listw
object, as used in the S-Plus SpatialStats module and the sn2listw()
function internally to construct an nb
object (ragged wide representation). Alternatives often fail to return sufficient information to permit the neighbours to be identified.
The soi.graph()
function takes triangulated neighbours and prunes off neighbour relationships represented by unusually long edges, especially around the convex hull, but may render the output object asymmetric. Other graph-based approaches include relativeneigh()
and gabrielneigh()
.
The output of these functions is then converted to the nb
representation using graph2nb()
, with the possible use of the sym=
argument to coerce to symmetry. We take the centroids of the largest component polygon for each observation as the point representation; population-weighted centroids might have been a better choice if they were available:
pol_pres15 |>
st_geometry() |>
st_centroid(of_largest_polygon=TRUE) -> coords
(coords |> tri2nb() -> nb_tri)
## Neighbour list object:
## Number of regions: 2495
## Number of nonzero links: 14930
## Percentage nonzero weights: 0.2398384
## Average number of links: 5.983968
The average number of neighbours is similar to the Queen boundary contiguity case, but if we look at the distribution of edge lengths using nbdists()
, we can see that although the upper quartile is about 15 km, the maximum is almost 300 km, an edge along much of one side of the convex hull. The short minimum distance is also of interest, as many centroids of urban municipalities are very close to the centroids of their surrounding rural counterparts.
nb_tri |>
nbdists(coords) |>
unlist() |>
summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 246.6 9847.2 12151.2 13485.2 14993.5 296973.7
Triangulated neighbours also yield a connected graph:
(nb_tri |> n.comp.nb())$nc
## [1] 1
The sphere of influence graph trims a neighbour object such as nb_tri
to remove edges that seem long in relation to typical neighbours (Avis and Horton 1985).
(nb_tri |>
soi.graph(coords) |>
graph2nb() -> nb_soi)
## Neighbour list object:
## Number of regions: 2495
## Number of nonzero links: 12792
## Percentage nonzero weights: 0.2054932
## Average number of links: 5.127054
Unpicking the triangulated neighbours does however remove the connected character of the underlying graph:
(nb_soi |> n.comp.nb() -> n_comp)$nc
## [1] 16
The SoI algorithm has stripped out longer edges leading to urban and rural municipality pairs where their centroids are very close to each other because the rural ones completely surround the urban, giving 15 pairs of neighbours unconnected to the main graph:
table(n_comp$comp.id)
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 2465 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
The largest length edges along the convex hull have been removed, but “holes” have appeared where the unconnected pairs of neighbours have appeared. The differences between nb_tri
and nb_soi
are shown in orange.
opar <- par(mar=c(0,0,0,0)+0.5)
pol_pres15 |>
st_geometry() |>
plot(border="grey", lwd=0.5)
nb_soi |>
plot(coords=coords, add=TRUE, points=FALSE, lwd=0.5)
nb_tri |>
diffnb(nb_soi) |>
plot(coords=coords, col="orange", add=TRUE, points=FALSE, lwd=0.5)
par(opar)
Distance-based neighbours can be constructed using dnearneigh()
, with a distance band with lower d1=
and upper d2=
bounds controlled by the bounds=
argument. If spherical coordinates are used and either specified in the coordinates object x
or with x
as a two column matrix and longlat=TRUE
, great circle distances in km will be calculated assuming the WGS84 reference ellipsoid. From spdep 1.1-7, two arguments have been added, to use functionality in the dbscan package for finding neighbours using a kd-tree in two or three dimensions by default, and not to test the symmetry of the output neighbour object.
The knearneigh()
function for \(k\)-nearest neighbours returns a knn
object, converted to an nb
object using knn2nb()
. It can also use great circle distances, not least because nearest neighbours may differ when uprojected coordinates are treated as planar. k=
should be a small number. For projected coordinates, the dbscan package is used to compute nearest neighbours more efficiently. Note that nb
objects constructed in this way are most unlikely to be symmetric, hence knn2nb()
has a sym=
argument to permit the imposition of symmetry, which will mean that all units have at least k=
neighbours, not that all units will have exactly k=
neighbours.
The nbdists()
function returns the length of neighbour relationship edges in the units of the coordinates if the coordinates are projected, in km otherwise. In order to set the upper limit for distance bands, one may first find the maximum first nearest neighbour distance, using unlist()
to remove the list structure of the returned object.
coords |>
knearneigh(k=1) |>
knn2nb() |>
nbdists(coords) |>
unlist() |>
summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 246.5 6663.4 8538.0 8275.1 10123.9 17978.8
Here the largest first nearest neighbour distance is just under 18 km, so using this as the upper threshold gives certainty that all units will have at least one neighbour:
system.time(coords |> dnearneigh(0, 18000) -> nb_d18)
## user system elapsed
## 0.152 0.000 0.153
system.time(coords |> dnearneigh(0, 18000, use_kd_tree=FALSE) -> nb_d18a)
## user system elapsed
## 0.144 0.000 0.144
all.equal(nb_d18, nb_d18a, check.attributes=FALSE)
## [1] TRUE
nb_d18
## Neighbour list object:
## Number of regions: 2495
## Number of nonzero links: 20358
## Percentage nonzero weights: 0.3270348
## Average number of links: 8.159519
However, even though there are no no-neighbour observations (their presence is reported by the print method for nb
objects), the graph is not connected, as a pair of observations are each others’ only neighbours.
(nb_d18 |> n.comp.nb() -> n_comp)$nc
## [1] 2
table(n_comp$comp.id)
##
## 1 2
## 2493 2
Adding 300 m to the threshold gives us a neighbour object with no no-neighbour units, and all units can be reached from all others across the graph.
(coords |> dnearneigh(0, 18300) -> nb_d183)
## Neighbour list object:
## Number of regions: 2495
## Number of nonzero links: 21086
## Percentage nonzero weights: 0.3387296
## Average number of links: 8.451303
(nb_d183 |> n.comp.nb())$nc
## [1] 1
One characteristic of distance-based neighbours is that more densely settled areas, with units which are smaller in terms of area (Warsaw boroughs are much smaller on average, but have almost 30 neighbours). Having many neighbours smooths the neighbour relationship across more neighbours. For use later, we also construct a neighbour object with no-neighbour units, using a threshold of 16 km:
(coords |> dnearneigh(0, 16000) -> nb_d16)
## Neighbour list object:
## Number of regions: 2495
## Number of nonzero links: 15850
## Percentage nonzero weights: 0.2546175
## Average number of links: 6.352705
## 7 regions with no links:
## 569 1371 1522 2374 2385 2473 2474
It is possible to control the numbers of neighbours directly using \(k\)-nearest neighbours, either accepting asymmetric neighbours or imposing symmetry:
(coords |> knearneigh(k=6) -> knn_k6) |> knn2nb()
## Neighbour list object:
## Number of regions: 2495
## Number of nonzero links: 14970
## Percentage nonzero weights: 0.240481
## Average number of links: 6
## Non-symmetric neighbours list
(knn_k6 |> knn2nb(sym=TRUE) -> nb_k6s)
## Neighbour list object:
## Number of regions: 2495
## Number of nonzero links: 16810
## Percentage nonzero weights: 0.2700391
## Average number of links: 6.737475
Here the size of k=
is sufficient to ensure connectedness, although the graph is not planar as edges cross at locations other than nodes, which is not the case for contiguous or graph-based neighbours.
(nb_k6s |> n.comp.nb())$nc
## [1] 1
Once neighbour objects are available, further choices need to made in specifying the weights objects. The nb2listw()
function is used to create a listw
weights object with an nb
object, a matching list of weights vectors, and a style specification. Because handling no-neighbour observations now begins to matter, the zero.policy=
argument is introduced. By default, this is FALSE
, indicating that no-neighbour observations will cause an error, as the spatially lagged value for an observation with no neighbours is not available. By convention, zero is substituted for the lagged value, as the cross product of a vector of zero-valued weights and a data vector, hence the name of zero.policy
.
args(nb2listw)
## function (neighbours, glist = NULL, style = "W", zero.policy = NULL)
## NULL
We will be using the helper function spweights.constants()
below to show some consequences of varing style choices. It returns constants for a listw
object, \(n\) is the number of observations, n1
to n3
are \(n-1, \ldots\), nn
is \(n^2\) and \(S_0\), \(S_1\) and \(S_2\) are constants, \(S_0\) being the sum of the weights. There is a full discussion of the constants in Bivand and Wong (2018).
args(spweights.constants)
## function (listw, zero.policy = NULL, adjust.n = TRUE)
## NULL
The "B"
binary style gives a weight of unity to each neighbour relationship, and typically upweights units with no boundaries on the edge of the study area.
(nb_q |>
nb2listw(style="B") -> lw_q_B) |>
spweights.constants() |>
data.frame() |>
subset(select=c(n, S0, S1, S2))
## n S0 S1 S2
## 1 2495 14242 28484 357280
The "W"
row-standardized style upweights units around the edge of the study area that necessarily have fewer neighbours. This style first gives a weight of unity to each neighbour relationship, then divides these weights by the per unit sums of weights. Naturally this leads to division by zero where there are no neighbours, a not-a-number result, unless the chosen policy is to permit no-neighbour observations. We can see that \(S_0\) is now equal to \(n\).
(nb_q |>
nb2listw(style="W") -> lw_q_W) |>
spweights.constants() |>
data.frame() |>
subset(select=c(n, S0, S1, S2))
## n S0 S1 S2
## 1 2495 2495 957.5303 10406.44
Inverse distance weights are used in a number of scientific fields. Some use dense inverse distance matrices, but many of the inverse distances are close to zero, so have little practical contribution, especially as the spatial process matrix is itself dense. Inverse distance weights may be constructed by taking the lengths of edges, changing units to avoid most weights being too large or small (here from m to km), taking the inverse, and passing through the glist=
argument to nb2listw()
:
nb_d183 |>
nbdists(coords) |>
lapply(\(x) 1/(x/1000)) -> gwts
(nb_d183 |> nb2listw(glist=gwts, style="B") -> lw_d183_idw_B) |>
spweights.constants() |>
data.frame() |>
subset(select=c(n, S0, S1, S2))
## n S0 S1 S2
## 1 2495 1841.345 533.7606 7264.958
No-neighbour handling is by default to prevent the construction of a weights object, making the analyst take a position on how to proceed.
try(nb_d16 |> nb2listw(style="B") -> lw_d16_B)
## Error in nb2listw(nb_d16, style = "B") : Empty neighbour sets found
Use can be made of the zero.policy=
argument to many functions used with nb
and listw
objects.
nb_d16 |>
nb2listw(style="B", zero.policy=TRUE) |>
spweights.constants(zero.policy=TRUE) |>
data.frame() |>
subset(select=c(n, S0, S1, S2))
## n S0 S1 S2
## 1 2488 15850 31700 506480
Note that by default the adjust.n=
argument to spweights.constants()
is set by default to TRUE
, subtracting the count of no-neighbour observations from the observation count, so \(n\) is smaller with possible consequences for inference. The complete count can be retrieved by changing the argument.
We recall the characteristics of the neighbour object based on Queen contiguities:
nb_q
## Neighbour list object:
## Number of regions: 2495
## Number of nonzero links: 14242
## Percentage nonzero weights: 0.2287862
## Average number of links: 5.708216
If we wish to create an object showing \(i\) to \(k\) neighbours, where \(i\) is a neighbour of \(j\), and \(j\) in turn is a neighbour of \(k\), so taking two steps on the neighbour graph, we can use nblag()
, which automatically removes \(i\) to \(i\) self-neighbours:
(nb_q |> nblag(2) -> nb_q2)[[2]]
## Neighbour list object:
## Number of regions: 2495
## Number of nonzero links: 32930
## Percentage nonzero weights: 0.5289939
## Average number of links: 13.1984
The nblag_cumul()
function cumulates the list of neighbours for the whole list of lags:
nblag_cumul(nb_q2)
## Neighbour list object:
## Number of regions: 2495
## Number of nonzero links: 47172
## Percentage nonzero weights: 0.7577801
## Average number of links: 18.90661
while the set operation union.nb()
takes two objects, giving here the same outcome:
union.nb(nb_q2[[2]], nb_q2[[1]])
## Neighbour list object:
## Number of regions: 2495
## Number of nonzero links: 47172
## Percentage nonzero weights: 0.7577801
## Average number of links: 18.90661
Returning to the graph representation of the same neighbour object, we can ask how many steps might be needed to traverse the graph:
diameter(g1)
## [1] 52
We step out from each observation across the graph to establish the number of steps needed to reach each other observation by the shortest path, once again finding the same count, and that the municipality is called Lutowiska, close to the Ukrainian border in the far south east of the country.
g1 |> shortest.paths() -> sps
(sps |> apply(2, max) -> spmax) |> max()
## [1] 52
mr <- which.max(spmax)
pol_pres15$name0[mr]
## [1] "Lutowiska"
This shows that contiguity neighbours represent the same kinds of relationships with other observations as distance. Some approaches prefer distance neighbours on the basis that, for example, inverse distance neighbours show clearly how all observations are related to each other. However, the development of both tests for spatial autocorrelation, and spatial regression models has involved the inverse of a spatial process model, which in turn can be represented as the sum of a power series of the neighbour object and a coefficient, so intrinsically acknowledging the relationships of all with all other. Sparse contiguity neighbour objects accommodate rich dependency structures without the need to make the structures explicit.
Relationship of shortest paths to distance for Lutowiska; left panel: shortest path counts from Lutowiska; right panel: plot of shortest paths from Lutowiska to other observations and distances (km) from Lutowiska to other observations
gridExtra::grid.arrange(tmap_grob(tm1), g1, nrow=1)
## Note that the aspect ratio for which the grob has been generated is 1.4
Finally, we create contiguity neighbours for the US Census tracts, completing in a respectable 25 seconds:
library(spdep)
t0 <- system.time(nb_subset <- poly2nb(df_tracts, queen=TRUE, row.names=map10$GEOID))
saveRDS(nb_subset, file="nb_subset.rds")
# user system elapsed
# 24.233 0.129 24.417
library(spdep)
(nb_subset <- readRDS("nb_subset.rds"))
## Neighbour list object:
## Number of regions: 71353
## Number of nonzero links: 438146
## Percentage nonzero weights: 0.008605862
## Average number of links: 6.140541
## 17 regions with no links:
## 12745 26628 29077 29120 32398 32775 43104 46344 46390 52303 58113 58261 70000 70055 70058 70457 70458
Spatial, time series, and spatio-temporal data breaks the fundamental rule of independence of observations (social networks do too)
Often, we are not sampling from a known population to get observations, so caution is needed for inference
Often again, the observation units or entities are not chosen by us, but by others
Designing spatial samples has become a less important part of the field than previously (Ripley 1981; Müller 2007)
If we want to detect and classify patterns (ML), infer about covariates, or interpolate from a fitted model, we need models that take account of the possible interdependency of spatial, time series and spatio-temporal observations
Here we are focussing on the spatial case; time differs in having a direction of flow, and spatio-temporal processes are necessarily more complex, especially when non-separable
We will not look at machine learning issues, although the partition into training and test sets raises important spatial questions for tuning (Schratz et al. 2019)
Spatial point processes are most closely tied to the relative positions of the observations, but may accommodate inhomegeneities
Geostatistics is concerned with interpolating values of a variable of interest at unobserved locations, and the relative positions of the observations contribute through the modelling a function of differences in observed values of that variable between observations at increasing distances from each other
Disease mapping, spatial econometrics/regression and the application of generalized linear mixed models (GLMM, GAMM) in for example ecology are more interested in inferences about the spatial processes in play and the included covariates; some approaches may use distance based spatial correlation functions, others use graph or lattice neighbourhoods
Sometimes when we use spatial data, we jump to spatial statistical methods because of the widely cited first law of geography, that nearby observations are more likely to be similar than those further away
But maybe what we see as clustering, patchiness, pattern, that looks spatial is actually mis-specification, such as a missing covariate, and/or inappropriate functional form, and/or including variables acting at different scales, and/or consequences of suboptimal bounding of tesselated observations …
Here, we’ll first look at the consequences of treating inhomogeneous point as homogeneous (the intensity trends upwards with x
)
Then we’ll use those points to see what happens in adding a similar trend to a random variable; empirical variograms are constructed ignoring and including the trend, and tests for global and local spatial autocorrelation are calculated
We can start by using spatstat (Baddeley, Rubak, and Turner 2015) to generate completely spatially random (CSR) points with intensity increasing with x
to introduce trend inhomogeneity in a unit square (note the use of set.seed()
:
suppressPackageStartupMessages(library(spatstat))
intenfun <- function(x, y) 200 * x
set.seed(1)
(ihpp <- rpoispp(intenfun, lmax = 200))
## Planar point pattern: 95 points
## window: rectangle = [0, 1] x [0, 1] units
plot(density(ihpp), axes=TRUE)
points(ihpp, col="green2", pch=19)
We can use \(\hat{K}\) tests ignoring and including inhomogeneity to adapt the test to the underlying data generation process. The homogeneous test examines the divergence from a theoretical CSR at distance bins, the inhomogeneous tries to guess the kind of patterning in the relative positions of the point observations. If we ignore the inhomogeneity, we find significant clustering at most distances, with the opposite finding when using the test attempting to accommodate inhomogeneity:
opar <- par(mfrow=c(1,2))
plot(envelope(ihpp, Kest, verbose=FALSE), main="Homogeneous")
plot(envelope(ihpp, Kinhom, verbose=FALSE), main="Inhomogeneous")
par(opar)
y
trend variableCoercing the spatstat "ppp"
object to an sf "sf"
object is trivial, but first adds the window of the point process, which we need to drop by retaining only those labelled "point"
. Then we take the x
coordinate and use it to create y
, which trends with x
; the DGP is not very noisy.
library(sf)
sf_ihpp <- subset(st_as_sf(ihpp, crs=32662), label == "point")
#st_as_sf(ihpp) %>% dplyr::filter(label == "point") -> sf_ihpp
#st_as_sf(ihpp) ->.; .[.$label == "point",] -> sf_ihpp
crds <- st_coordinates(sf_ihpp)
sf_ihpp$x <- crds[,1]
sf_ihpp$y <- 100 + 50 * sf_ihpp$x + 20 * rnorm(nrow(sf_ihpp))
plot(sf_ihpp[,"y"], pch=19)
Variograms for models ignoring (y ~ 1
) and including (y ~ x
) the trend; if we ignore the trend, we find spurious relationships, shown by loess()
fits
suppressPackageStartupMessages(library(gstat))
vg0 <- variogram(y ~ 1, sf_ihpp)
vg1 <- variogram(y ~ x, sf_ihpp)
library(ggplot2)
g0 <- ggplot(vg0, aes(x=dist, y=gamma)) + geom_point() + geom_smooth(span=1) + ylim(300, 575) + ggtitle("Trend ignored")
g1 <- ggplot(vg1, aes(x=dist, y=gamma)) + geom_point() + geom_smooth(span=1) + ylim(300, 575) + ggtitle("Trend included")
gridExtra::grid.arrange(g0, g1, ncol=2)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
Going on from a continuous to a discrete treatment of space, we can use functions in spdep to define neighbours and then test for global and local spatial autocorrelation. Making first a triangulated neighbour objects, we can thin out improbable neighbours lying too far from one another using a sphere of influence graph to make a symmetric neighbour object:
suppressPackageStartupMessages(library(spdep))
nb_tri <- tri2nb(crds)
(nb_soi <- graph2nb(soi.graph(nb_tri, crds), sym=TRUE))
## Neighbour list object:
## Number of regions: 95
## Number of nonzero links: 302
## Percentage nonzero weights: 3.34626
## Average number of links: 3.178947
The sphere of influence graph generates three subgraphs; a bit unfortunate, but we can live with that (for now):
plot(nb_soi, crds)
comps <- n.comp.nb(nb_soi)
sf_ihpp$comps <- comps$comp.id
comps$nc
## [1] 3
Using binary weights, looking at the variable but ignoring the trend, we find strong global spatial autocorrelation using global Moran’s \(I\) (tests return "htest"
objects tidied here using broom)
lwB <- nb2listw(nb_soi, style="B")
out <- broom::tidy(moran.test(sf_ihpp$y, listw=lwB, randomisation=FALSE, alternative="two.sided"))[1:5]
names(out)[1:3] <- c("Moran's I", "Expectation", "Variance"); out
## # A tibble: 1 x 5
## `Moran's I` Expectation Variance statistic p.value
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.342 -0.0106 0.00633 4.44 0.00000917
If, however, we take the residuals after including the trend using a linear model, the apparent spatial autocorrelation evaporates
lm_obj <- lm(y ~ x, data=sf_ihpp)
out <- broom::tidy(lm.morantest(lm_obj, listw=lwB, alternative="two.sided"))[1:5]
names(out)[1:3] <- c("Moran's I", "Expectation", "Variance"); out
## # A tibble: 1 x 5
## `Moran's I` Expectation Variance statistic[,1] p.value[,1]
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.0579 -0.0199 0.00623 0.986 0.324
The same happens if we calculate local Moran’s \(I\) ignoring the trend (randomisation assumption), and including the trend (normality assumption and saddlepoint approximation)
lmor0 <- localmoran(sf_ihpp$y, listw=lwB, alternative="two.sided")
lmor1 <- as.data.frame(localmoran.sad(lm_obj, nb=nb_soi, style="B", alternative="two.sided"))
sf_ihpp$z_value <- lmor0[,4]
sf_ihpp$z_lmor1_N <- lmor1[,2]
sf_ihpp$z_lmor1_SAD <- lmor1[,4]
suppressPackageStartupMessages(library(tmap))
tm_shape(sf_ihpp) + tm_symbols(col=c("z_value", "z_lmor1_N", "z_lmor1_SAD"), midpoint=0) + tm_facets(free.scales=FALSE, nrow=1) + tm_layout(panel.labels=c("No trend", "Trend, normal", "Trend, SAD"))
Look at the North Carolina SIDS data vignette for background:
library(sf)
nc <- st_read(system.file("shapes/sids.shp", package="spData")[1], quiet=TRUE)
st_crs(nc) <- "+proj=longlat +datum=NAD27"
row.names(nc) <- as.character(nc$FIPSNO)
head(nc)
## Simple feature collection with 6 features and 22 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
## CRS: +proj=longlat +datum=NAD27
## CNTY_ID AREA PERIMETER CNTY_ NAME FIPS FIPSNO CRESS_ID BIR74
## 37009 1825 0.114 1.442 1825 Ashe 37009 37009 5 1091
## 37005 1827 0.061 1.231 1827 Alleghany 37005 37005 3 487
## 37171 1828 0.143 1.630 1828 Surry 37171 37171 86 3188
## 37053 1831 0.070 2.968 1831 Currituck 37053 37053 27 508
## 37131 1832 0.153 2.206 1832 Northampton 37131 37131 66 1421
## 37091 1833 0.097 1.670 1833 Hertford 37091 37091 46 1452
## SID74 NWBIR74 BIR79 SID79 NWBIR79 east north x y lon
## 37009 1 10 1364 0 19 164 176 -81.67 4052.29 -81.48594
## 37005 0 10 542 3 12 183 182 -50.06 4059.70 -81.14061
## 37171 5 208 3616 6 260 204 174 -16.14 4043.76 -80.75312
## 37053 1 123 830 2 145 461 182 406.01 4035.10 -76.04892
## 37131 9 1066 1606 3 1197 385 176 281.10 4029.75 -77.44057
## 37091 7 954 1838 5 1237 411 176 323.77 4028.10 -76.96474
## lat L_id M_id geometry
## 37009 36.43940 1 2 MULTIPOLYGON (((-81.47276 3...
## 37005 36.52443 1 2 MULTIPOLYGON (((-81.23989 3...
## 37171 36.40033 1 2 MULTIPOLYGON (((-80.45634 3...
## 37053 36.45655 1 4 MULTIPOLYGON (((-76.00897 3...
## 37131 36.38799 1 4 MULTIPOLYGON (((-77.21767 3...
## 37091 36.38189 1 4 MULTIPOLYGON (((-76.74506 3...
The variables are largely count data, L_id
and M_id
are grouping variables. We can also read the original neighbour object:
library(spdep)
gal_file <- system.file("weights/ncCR85.gal", package="spData")[1]
ncCR85 <- read.gal(gal_file, region.id=nc$FIPSNO)
ncCR85
## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 492
## Percentage nonzero weights: 4.92
## Average number of links: 4.92
plot(st_geometry(nc), border="grey")
plot(ncCR85, st_centroid(st_geometry(nc), of_largest_polygon), add=TRUE, col="blue")
## Warning in st_centroid.sfc(st_geometry(nc), of_largest_polygon): st_centroid
## does not give correct centroids for longitude/latitude data
Now generate a random variable. Here I’ve set the seed - maybe choose your own, and compare outcomes with the people around you. With many people in the room, about 5 in 100 may get a draw that is autocorrelated when tested with Moran’s \(I\) (why?).
set.seed(1)
nc$rand <- rnorm(nrow(nc))
lw <- nb2listw(ncCR85, style="B")
moran.test(nc$rand, listw=lw, alternative="two.sided")
##
## Moran I test under randomisation
##
## data: nc$rand
## weights: lw
##
## Moran I statistic standard deviate = -0.82038, p-value = 0.412
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## -0.060754158 -0.010101010 0.003812269
Now we’ll create a trend (maybe try plotting LM
to see the trend pattern). Do we get different test outcomes by varying beta and sigma (alpha is constant).
nc$LM <- as.numeric(interaction(nc$L_id, nc$M_id))
alpha <- 1
beta <- 0.5
sigma <- 2
nc$trend <- alpha + beta*nc$LM + sigma*nc$rand
moran.test(nc$trend, listw=lw, alternative="two.sided")
##
## Moran I test under randomisation
##
## data: nc$trend
## weights: lw
##
## Moran I statistic standard deviate = 6.1163, p-value = 9.58e-10
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.368410659 -0.010101010 0.003829901
To get back to reality, include the trend in a linear model, and test again.
lm.morantest(lm(trend ~ LM, nc), listw=lw, alternative="two.sided")
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = trend ~ LM, data = nc)
## weights: lw
##
## Moran I statistic standard deviate = -0.50788, p-value = 0.6115
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I Expectation Variance
## -0.049567147 -0.018718176 0.003689356
So we can manipulate a missing variable mis-specification to look like spatial autocorrelation. Is this informative?
Sometimes we only have data on a covariate for aggregates of our units of observation. What happens when we “copy out” these aggregate values to the less aggregated observations? First we’ll aggregate nc
by LM
, then make a neighbour object for the aggregate units
aggLM <- aggregate(nc[,"LM"], list(nc$LM), head, n=1)
(aggnb <- poly2nb(aggLM))
## Neighbour list object:
## Number of regions: 12
## Number of nonzero links: 46
## Percentage nonzero weights: 31.94444
## Average number of links: 3.833333
Next, draw a random sample for the aggregated units:
set.seed(1)
LMrand <- rnorm(nrow(aggLM))
Check that it does not show any spatial autocorrelation:
moran.test(LMrand, nb2listw(aggnb, style="B"))
##
## Moran I test under randomisation
##
## data: LMrand
## weights: nb2listw(aggnb, style = "B")
##
## Moran I statistic standard deviate = -0.24911, p-value = 0.5984
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## -0.13101880 -0.09090909 0.02592483
Copy it out to the full data set, indexing on values of LM; the pattern now looks pretty autocorrelated
nc$LMrand <- LMrand[match(nc$LM, aggLM$LM)]
plot(nc[,"LMrand"])
which it is:
moran.test(nc$LMrand, listw=lw, alternative="two.sided")
##
## Moran I test under randomisation
##
## data: nc$LMrand
## weights: lw
##
## Moran I statistic standard deviate = 9.4301, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.575091242 -0.010101010 0.003850884
Again, we’ve manipulated ourselves into a situation with abundant spatial autocorrelation at the level of the counties, but only by copying out from a more aggregated level. What is going on?
sessionInfo()
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Fedora 34 (Workstation Edition)
##
## Matrix products: default
## BLAS: /home/rsb/topics/R/R410-share/lib64/R/lib/libRblas.so
## LAPACK: /home/rsb/topics/R/R410-share/lib64/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] gstat_2.0-7 spatstat_2.1-0 spatstat.linnet_2.1-1
## [4] spatstat.core_2.1-2 rpart_4.1-15 nlme_3.1-152
## [7] spatstat.geom_2.1-0 spatstat.data_2.1-0 ggplot2_3.3.3
## [10] igraph_1.2.6 Matrix_1.3-3 spdep_1.1-8
## [13] spData_0.3.8 sp_1.4-5 tmap_3.3-1
## [16] tidycensus_1.0 sf_0.9-9
##
## loaded via a namespace (and not attached):
## [1] leafem_0.1.3 colorspace_2.0-1 deldir_0.2-10
## [4] ellipsis_0.3.2 class_7.3-19 leaflet_2.0.4.1
## [7] rgdal_1.5-25 fftwtools_0.9-11 base64enc_0.1-3
## [10] dichromat_2.0-0 rstudioapi_0.13 proxy_0.4-25
## [13] farver_2.1.0 fansi_0.4.2 xml2_1.3.2
## [16] codetools_0.2-18 splines_4.1.0 knitr_1.33
## [19] polyclip_1.10-0 jsonlite_1.7.2 tmaptools_3.1-1
## [22] broom_0.7.6 png_0.1-7 spatstat.sparse_2.0-0
## [25] readr_1.4.0 compiler_4.1.0 httr_1.4.2
## [28] backports_1.2.1 assertthat_0.2.1 cli_2.5.0
## [31] leaflet.providers_1.9.0 htmltools_0.5.1.1 tools_4.1.0
## [34] coda_0.19-4 gtable_0.3.0 glue_1.4.2
## [37] dplyr_1.0.6 rappdirs_0.3.3 gmodels_2.18.1
## [40] Rcpp_1.0.6 jquerylib_0.1.4 raster_3.4-10
## [43] vctrs_0.3.8 gdata_2.18.0 tigris_1.4
## [46] leafsync_0.1.0 crosstalk_1.1.1 lwgeom_0.2-6
## [49] xfun_0.23 stringr_1.4.0 rvest_1.0.0
## [52] lifecycle_1.0.0 gtools_3.8.2 goftest_1.2-2
## [55] XML_3.99-0.6 zoo_1.8-9 LearnBayes_2.15.1
## [58] MASS_7.3-54 scales_1.1.1 hms_1.1.0
## [61] spatstat.utils_2.1-0 parallel_4.1.0 expm_0.999-6
## [64] RColorBrewer_1.1-2 spatialreg_1.1-8 yaml_2.2.1
## [67] gridExtra_2.3 sass_0.4.0 stringi_1.6.2
## [70] highr_0.9 maptools_1.1-1 spDataLarge_0.5.1
## [73] e1071_1.7-7 boot_1.3-28 intervals_0.15.2
## [76] rlang_0.4.11 pkgconfig_2.0.3 evaluate_0.14
## [79] lattice_0.20-44 tensor_1.5 purrr_0.3.4
## [82] htmlwidgets_1.5.3 labeling_0.4.2 tidyselect_1.1.1
## [85] magrittr_2.0.1 R6_2.5.0 generics_0.1.0
## [88] DBI_1.1.1 mgcv_1.8-35 pillar_1.6.1
## [91] foreign_0.8-81 withr_2.4.2 xts_0.12.1
## [94] units_0.7-1 stars_0.5-2 abind_1.4-5
## [97] spacetime_1.2-4 tibble_3.1.2 crayon_1.4.1
## [100] uuid_0.1-4 KernSmooth_2.23-20 utf8_1.2.1
## [103] rmarkdown_2.8 grid_4.1.0 FNN_1.1.3
## [106] digest_0.6.27 classInt_0.4-3 tidyr_1.1.3
## [109] dbscan_1.1-8 munsell_0.5.0 viridisLite_0.4.0
## [112] bslib_0.2.5.1