Copyright

All the material presented here, to the extent it is original, is available under CC-BY-SA. Parts build on joint tutorials with Edzer Pebesma.

Required current contributed CRAN packages:

I am running R 3.6.1, with recent update.packages().

needed <- c("igraph", "tmap", "spdep", "spData", "sp", "gstat", "sf", "spatstat", "spatstat.data")

Script

Script and data at https://github.com/rsbivand/ECS530_h19/raw/master/ECS530_VI.zip. Download to suitable location, unzip and use as basis.

Schedule

2/12 (I) Spatial data representation, (II) Support+topology, input/output
3/12 (III) Coordinate reference systems, (IV) Visualization
4/12 (VI) Spatial autocorrelation, project surgery
5/12 (VII) Spatial regression, (VIII) Spatial multilevel regression
6/12 (IX) Interpolation, point processes, project surgery
7/12 Presentations

Session VI

09:15-09:45 Does spatial autocorrelation really occur?
09:45-10:30 Creating neighbour and weights objects
10:30-11:00 Tests of spatial autocorrelation

Is spatial autocorrelation just poor data collection design and/or model mis-specification?

Spatial, time series, and spatio-temporal data breaks the fundamental rule of independence of observations (social networks do too)
Often, we are not sampling from a known population to get observations, so caution is needed for inference
Often again, the observation units or entities are not chosen by us, but by others
Designing spatial samples has become a less important part of the field than previously (Ripley 1981; Müller 2007)

Spatial modelling: why?

If we want to detect and classify patterns (ML), infer about covariates, or interpolate from a fitted model, we need models that take account of the possible interdependency of spatial, time series and spatio-temporal observations
Here we are focussing on the spatial case; time differs in having a direction of flow, and spatio-temporal processes are necessarily more complex, especially when non-separable
We will not look at machine learning issues, although the partition into training and test sets raises important spatial questions for tuning (Schratz et al. 2019)

Spatial modelling: which kinds?

Spatial point processes are most closely tied to the relative positions of the observations, but may accommodate inhomegeneities
Geostatistics is concerned with interpolating values of a variable of interest at unobserved locations, and the relative positions of the observations contribute through the modelling a function of differences in observed values of that variable between observations at increasing distances from each other
Disease mapping, spatial econometrics/regression and the application of generalized linear mixed models (GLMM, GAMM) in for example ecology are more interested in inferences about the spatial processes in play and the included covariates; some approaches may use distance based spatial correlation functions, others use graph or lattice neighbourhoods

When is a “spatial” process actually induced by the analyst

Sometimes when we use spatial data, we jump to spatial statistical methods because of the widely cited first law of geography, that nearby observations are more likely to be similar than those further away
But maybe what we see as clustering, patchiness, pattern, that looks spatial is actually mis-specification, such as a missing covariate, and/or inappropriate functional form, and/or including variables acting at different scales, and/or consequences of suboptimal bounding of tesselated observations …
Here, we’ll first look at the consequences of treating inhomogeneous point as homogeneous (the intensity trends upwards with x)
Then we’ll use those points to see what happens in adding a similar trend to a random variable; empirical variograms are constructed ignoring and including the trend, and tests for global and local spatial autocorrelation are calculated

Spatial point processes

We can start by using spatstat (Baddeley, Rubak, and Turner 2015) to generate completely spatially random (CSR) points with intensity increasing with x to introduce trend inhomogeneity in a unit square (note the use of set.seed():

suppressPackageStartupMessages(library(spatstat))
intenfun <- function(x, y) 200 * x
set.seed(1)
(ihpp <- rpoispp(intenfun, lmax = 200))

## Planar point pattern: 95 points
## window: rectangle = [0, 1] x [0, 1] units

plot(density(ihpp), axes=TRUE)
points(ihpp, col="green2", pch=19)

We can use \(\hat{K}\) tests ignoring and including inhomogeneity to adapt the test to the underlying data generation process. The homogeneous test examines the divergence from a theoretical CSR at distance bins, the inhomogeneous tries to guess the kind of patterning in the relative positions of the point observations. If we ignore the inhomogeneity, we find significant clustering at most distances, with the opposite finding when using the test attempting to accommodate inhomogeneity:

opar <- par(mfrow=c(1,2))
plot(envelope(ihpp, Kest, verbose=FALSE), main="Homogeneous")
plot(envelope(ihpp, Kinhom, verbose=FALSE), main="Inhomogeneous")

par(opar)

Adding a `y` trend variable

Coercing the spatstat "ppp" object to an sf "sf" object is trivial, but first adds the window of the point process, which we need to drop by retaining only those labelled "point". Then we take the x coordinate and use it to create y, which trends with x; the DGP is not very noisy.

library(sf)

## Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1

#st_as_sf(ihpp) %>% dplyr::filter(label == "point") -> sf_ihpp
st_as_sf(ihpp) ->.; .[.$label == "point",] -> sf_ihpp
crds <- st_coordinates(sf_ihpp)
sf_ihpp$x <- crds[,1]
sf_ihpp$y <- 100 + 50 * sf_ihpp$x + 20 * rnorm(nrow(sf_ihpp))
plot(sf_ihpp[,"y"], pch=19)

Variogram model

Variograms for models ignoring (y ~ 1) and including (y ~ x) the trend; if we ignore the trend, we find spurious relationships, shown by loess() fits

suppressPackageStartupMessages(library(gstat))
vg0 <- variogram(y ~ 1, sf_ihpp)
vg1 <- variogram(y ~ x, sf_ihpp)
opar <- par(mfrow=c(1,2))
plot(gamma ~ dist, vg0, ylim=c(0,550), main="Trend ignored")
s <- seq(0, 0.5, 0.01)
p0 <- predict(loess(gamma ~ dist, vg0, weights=np), newdata=data.frame(dist=s), se=TRUE)
lines(s, p0$fit)
lines(s, p0$fit-2*p0$se.fit, lty=2)
lines(s, p0$fit+2*p0$se.fit, lty=2)
plot(gamma ~ dist, vg1, ylim=c(0,550), main="Trend included")
p1 <- predict(loess(gamma ~ dist, vg1, weights=np), newdata=data.frame(dist=s), se=TRUE)
lines(s, p1$fit)
lines(s, p1$fit-2*p1$se.fit, lty=2)
lines(s, p1$fit+2*p1$se.fit, lty=2)

par(opar)

Spatial autocorrelation

Going on from a continuous to a discrete treatment of space, we can use functions in spdep to define neighbours and then test for global and local spatial autocorrelation. Making first a triangulated neighbour objects, we can thin out improbable neighbours lying too far from one another using a sphere of influence graph to make a symmetric neighbour object:

suppressPackageStartupMessages(library(spdep))
nb_tri <- tri2nb(crds)

(nb_soi <- graph2nb(soi.graph(nb_tri, crds), sym=TRUE))

## Neighbour list object:
## Number of regions: 95 
## Number of nonzero links: 302 
## Percentage nonzero weights: 3.34626 
## Average number of links: 3.178947

The sphere of influence graph generates three subgraphs; a bit unfortunate, but we can live with that (for now):

plot(nb_soi, crds)

comps <- n.comp.nb(nb_soi)
sf_ihpp$comps <- comps$comp.id
comps$nc

## [1] 3

Using binary weights, looking at the variable but ignoring the trend, we find strong global spatial autocorrelation using global Moran’s \(I\) (tests return "htest" objects tidied here using broom)

lwB <- nb2listw(nb_soi, style="B")
out <- broom::tidy(moran.test(sf_ihpp$y, listw=lwB, randomisation=FALSE, alternative="two.sided"))[1:5]
names(out)[1:3] <- c("Moran's I", "Expectation", "Variance"); out

## # A tibble: 1 x 5
##   `Moran's I` Expectation Variance statistic    p.value
##         <dbl>       <dbl>    <dbl>     <dbl>      <dbl>
## 1       0.342     -0.0106  0.00633      4.44 0.00000917

If, however, we take the residuals after including the trend using a linear model, the apparent spatial autocorrelation evaporates

lm_obj <- lm(y ~ x, data=sf_ihpp)
out <- broom::tidy(lm.morantest(lm_obj, listw=lwB, alternative="two.sided"))[1:5]
names(out)[1:3] <- c("Moran's I", "Expectation", "Variance"); out

## # A tibble: 1 x 5
##   `Moran's I` Expectation Variance statistic[,1] p.value[,1]
##         <dbl>       <dbl>    <dbl>         <dbl>       <dbl>
## 1      0.0579     -0.0199  0.00623         0.986       0.324

The same happens if we calculate local Moran’s \(I\) ignoring the trend (randomisation assumption), and including the trend (normality assumption and saddlepoint approximation)

lmor0 <- localmoran(sf_ihpp$y, listw=lwB, alternative="two.sided")
lmor1 <- as.data.frame(localmoran.sad(lm_obj, nb=nb_soi, style="B", alternative="two.sided"))
sf_ihpp$z_value <- lmor0[,4]
sf_ihpp$z_lmor1_N <- lmor1[,2]
sf_ihpp$z_lmor1_SAD <- lmor1[,4]

suppressPackageStartupMessages(library(tmap))
tm_shape(sf_ihpp) + tm_symbols(col=c("z_value", "z_lmor1_N", "z_lmor1_SAD"), midpoint=0) + tm_facets(free.scales=FALSE, nrow=1) + tm_layout(panel.labels=c("No trend", "Trend, normal", "Trend, SAD"))

Look at the North Carolina SIDS data vignette for background:

library(sf)
nc <- st_read(system.file("shapes/sids.shp", package="spData")[1], quiet=TRUE)
st_crs(nc) <- "+proj=longlat +datum=NAD27"
row.names(nc) <- as.character(nc$FIPSNO)
head(nc)

## Simple feature collection with 6 features and 22 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs
##       CNTY_ID  AREA PERIMETER CNTY_        NAME  FIPS FIPSNO CRESS_ID BIR74
## 37009    1825 0.114     1.442  1825        Ashe 37009  37009        5  1091
## 37005    1827 0.061     1.231  1827   Alleghany 37005  37005        3   487
## 37171    1828 0.143     1.630  1828       Surry 37171  37171       86  3188
## 37053    1831 0.070     2.968  1831   Currituck 37053  37053       27   508
## 37131    1832 0.153     2.206  1832 Northampton 37131  37131       66  1421
## 37091    1833 0.097     1.670  1833    Hertford 37091  37091       46  1452
##       SID74 NWBIR74 BIR79 SID79 NWBIR79 east north      x       y       lon
## 37009     1      10  1364     0      19  164   176 -81.67 4052.29 -81.48594
## 37005     0      10   542     3      12  183   182 -50.06 4059.70 -81.14061
## 37171     5     208  3616     6     260  204   174 -16.14 4043.76 -80.75312
## 37053     1     123   830     2     145  461   182 406.01 4035.10 -76.04892
## 37131     9    1066  1606     3    1197  385   176 281.10 4029.75 -77.44057
## 37091     7     954  1838     5    1237  411   176 323.77 4028.10 -76.96474
##            lat L_id M_id                       geometry
## 37009 36.43940    1    2 MULTIPOLYGON (((-81.47276 3...
## 37005 36.52443    1    2 MULTIPOLYGON (((-81.23989 3...
## 37171 36.40033    1    2 MULTIPOLYGON (((-80.45634 3...
## 37053 36.45655    1    4 MULTIPOLYGON (((-76.00897 3...
## 37131 36.38799    1    4 MULTIPOLYGON (((-77.21767 3...
## 37091 36.38189    1    4 MULTIPOLYGON (((-76.74506 3...

The variables are largely count data, L_id and M_id are grouping variables. We can also read the original neighbour object:

library(spdep)
gal_file <- system.file("weights/ncCR85.gal", package="spData")[1]
ncCR85 <- read.gal(gal_file, region.id=nc$FIPSNO)
ncCR85

## Neighbour list object:
## Number of regions: 100 
## Number of nonzero links: 492 
## Percentage nonzero weights: 4.92 
## Average number of links: 4.92

plot(st_geometry(nc), border="grey")
plot(ncCR85, st_centroid(st_geometry(nc), of_largest_polygon), add=TRUE, col="blue")

## Warning in st_centroid.sfc(st_geometry(nc), of_largest_polygon): st_centroid
## does not give correct centroids for longitude/latitude data

Now generate a random variable. Here I’ve set the seed - maybe choose your own, and compare outcomes with the people around you. With many people in the room, about 5 in 100 may get a draw that is autocorrelated when tested with Moran’s \(I\) (why?).

set.seed(1)
nc$rand <- rnorm(nrow(nc))
lw <- nb2listw(ncCR85, style="B")
moran.test(nc$rand, listw=lw, alternative="two.sided")

## 
##  Moran I test under randomisation
## 
## data:  nc$rand  
## weights: lw    
## 
## Moran I statistic standard deviate = -0.82038, p-value = 0.412
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      -0.060754158      -0.010101010       0.003812269

Now we’ll create a trend (maybe try plotting LM to see the trend pattern). Do we get different test outcomes by varying beta and sigma (alpha is constant).

nc$LM <- as.numeric(interaction(nc$L_id, nc$M_id))
alpha <- 1
beta <- 0.5
sigma <- 2
nc$trend <- alpha + beta*nc$LM + sigma*nc$rand
moran.test(nc$trend, listw=lw, alternative="two.sided")

## 
##  Moran I test under randomisation
## 
## data:  nc$trend  
## weights: lw    
## 
## Moran I statistic standard deviate = 6.1163, p-value = 9.58e-10
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.368410659      -0.010101010       0.003829901

To get back to reality, include the trend in a linear model, and test again.

lm.morantest(lm(trend ~ LM, nc), listw=lw, alternative="two.sided")

## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = trend ~ LM, data = nc)
## weights: lw
## 
## Moran I statistic standard deviate = -0.50788, p-value = 0.6115
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     -0.049567147     -0.018718176      0.003689356

So we can manipulate a missing variable mis-specification to look like spatial autocorrelation. Is this informative?

Sometimes we only have data on a covariate for aggregates of our units of observation. What happens when we “copy out” these aggregate values to the less aggregated observations? First we’ll aggregate nc by LM, then make a neighbour object for the aggregate units

aggLM <- aggregate(nc[,"LM"], list(nc$LM), head, n=1)
(aggnb <- poly2nb(aggLM))

## Neighbour list object:
## Number of regions: 12 
## Number of nonzero links: 46 
## Percentage nonzero weights: 31.94444 
## Average number of links: 3.833333

Next, draw a random sample for the aggregated units:

set.seed(1)
LMrand <- rnorm(nrow(aggLM))

Check that it does not show any spatial autocorrelation:

moran.test(LMrand, nb2listw(aggnb, style="B"))

## 
##  Moran I test under randomisation
## 
## data:  LMrand  
## weights: nb2listw(aggnb, style = "B")    
## 
## Moran I statistic standard deviate = -0.24911, p-value = 0.5984
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       -0.13101880       -0.09090909        0.02592483

Copy it out to the full data set, indexing on values of LM; the pattern now looks pretty autocorrelated

nc$LMrand <- LMrand[match(nc$LM, aggLM$LM)]
plot(nc[,"LMrand"])

which it is:

moran.test(nc$LMrand, listw=lw, alternative="two.sided")

## 
##  Moran I test under randomisation
## 
## data:  nc$LMrand  
## weights: lw    
## 
## Moran I statistic standard deviate = 9.4301, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.575091242      -0.010101010       0.003850884

Again, we’ve manipulated ourselves into a situation with abundant spatial autocorrelation at the level of the counties, but only by copying out from a more aggregated level. What is going on?

Graph/weight-based neighbours; spatially structured random effects

Graph/weight-based neighbours

Some spatial processes are best represented by decreasing functions of distance
Other spatial processes best represent proximity, contiguity, neighbourhood; the functions then relate to steps along edges on graphs of neighbours
Under certain conditions, the power series \(\rho^i {\mathbf W}^i\) declines in intensity as \(i\) increases from \(0\)
Here we will use areal support, corresponding to a proximity view of spatial processes (Wall 2004)

Polish presidential election 2015 data set

We’ll use a typical moderate sized data set of Polish municipalities (including boroughs in the capital, Warsaw), included in spDataLarge as used in Geocomputation with R (Lovelace, Nowosad, and Muenchow 2019)

library(sf)
# if(!require("spData", quietly=TRUE)) install.packages("spData")
# if(!require("spDataLarge", quietly=TRUE)) install.packages("spDataLarge",
#   repos = "https://nowosad.github.io/drat/", type = "source")
data(pol_pres15, package="spDataLarge")
pol_pres15 <- st_buffer(pol_pres15, dist=0)
head(pol_pres15[, c(1, 4, 6)])

## Simple feature collection with 6 features and 3 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 235157.1 ymin: 366913.3 xmax: 281431.7 ymax: 413016.6
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=0 +lon_0=18.99999999999998 +k=0.9993 +x_0=500000 +y_0=-5300000 +ellps=GRS80 +towgs84=0,0,0 +units=m +no_defs
##    TERYT                name       types                       geometry
## 1 020101         BOLESŁAWIEC       Urban POLYGON ((261089.5 385540.3...
## 2 020102         BOLESŁAWIEC       Rural POLYGON ((254150 383742.4, ...
## 3 020103            GROMADKA       Rural POLYGON ((275346 384616, 26...
## 4 020104        NOWOGRODZIEC Urban/rural POLYGON ((251769.8 377084.8...
## 5 020105          OSIECZNICA       Rural POLYGON ((263423.9 406004.7...
## 6 020106 WARTA BOLESŁAWIECKA       Rural POLYGON ((267030.7 387030.3...

The dataset has 2495 observations with areal support on 65 variables, most counts of election results aggregated from the election returns by polling station. See:

?spDataLarge::pol_pres15

for details.

Contiguous, proximate neighbours

The spdep function poly2nb() finds proximate, contiguous neighbours by finding one (queen) or two (rook) boundary points within a snap distance of each other for polygons in a candidate list based on intersecting bounding boxes:

suppressPackageStartupMessages(library(spdep))
system.time(nb_q <- poly2nb(pol_pres15, queen=TRUE))

##    user  system elapsed 
##   1.650   0.033   1.875

The object returned is very sparse; the print method reports asymmetric objects and objects with observations with no neighbours.

nb_q

## Neighbour list object:
## Number of regions: 2495 
## Number of nonzero links: 14242 
## Percentage nonzero weights: 0.2287862 
## Average number of links: 5.708216

opar <- par(mar=c(0,0,0,0)+0.5)
plot(st_geometry(pol_pres15), border="grey", lwd=0.5)
coords <- st_centroid(st_geometry(pol_pres15),
  of_largest_polygon=TRUE)
plot(nb_q, coords=st_coordinates(coords), add=TRUE, points=FALSE, lwd=0.5)

par(opar)

We can use GEOS through sf to improve the speed of detection of candidate neighbours, by building a geometry column of bounding boxes of the underlying polygons and finding which intersect. We pass this object to poly2nb() through the foundInBox= argument. This may be useful for larger data sets.

system.time({
  fB1 <- st_intersects(st_as_sfc(lapply(st_geometry(pol_pres15), function(x) {
    st_as_sfc(st_bbox(x))[[1]]
  })))
  fB1a <- lapply(seq_along(fB1), function(i) fB1[[i]][fB1[[i]] > i])
  fB1a <- fB1a[-length(fB1a)]
  nb_sf_q1 <- poly2nb(pol_pres15, queen=TRUE, foundInBox=fB1a)
})

##    user  system elapsed 
##   1.215   0.015   1.430

The two neighbour objects are the same:

all.equal(nb_q, nb_sf_q1, check.attributes=FALSE)

## [1] TRUE

We can further check the object for the number of distinct graph components; there is only one, so each observation can be reached from every other observation by traversing the graph edges:

n.comp.nb(nb_q)$nc

## [1] 1

The "listw" spatial weights object needed for testing or modelling can be created using nb2listw(), here with binary weights, \(1\) for first-order contiguity, \(0\) for not a neighbour:

lw <- nb2listw(nb_q, style="B")

We can coerce this further to a sparse matrix as defined in the Matrix package:

library(Matrix)
W <- as(lw, "CsparseMatrix")

and check symmetry directly, (t() is the transpose method):

isTRUE(all.equal(W, t(W)))

## [1] TRUE

Powering up the sparse matrix fills in the higher order neighbours:

image(W)

WW <- W %*% W
image(WW)

W3 <- WW %*% W
image(W3)

W4 <- W3 %*% W
image(W4)

There is an spdep vignette considering the use of the igraph adjacency representation through sparse matrices as intermediate objects:

library(igraph)
g1 <- graph.adjacency(W, mode="undirected")
class(g1)

## [1] "igraph"

We can get an "nb" neighbour object back from the adjacency representation:

B1 <- get.adjacency(g1)
mat2listw(B1)$neighbours

## Neighbour list object:
## Number of regions: 2495 
## Number of nonzero links: 14242 
## Percentage nonzero weights: 0.2287862 
## Average number of links: 5.708216

There is a single graph component:

c1 <- clusters(g1)
c1$no

## [1] 1

The graph is connected:

is.connected(g1)

## [1] TRUE

with diameter:

dg1 <- diameter(g1)
dg1

## [1] 52

The diameter is the longest path across the graph, where shortest.paths() returns a matrix of all the edge counts on shortest paths between observations:

sp_mat <- shortest.paths(g1)
max(sp_mat)

## [1] 52

str(sp_mat)

##  num [1:2495, 1:2495] 0 1 2 2 2 2 10 10 9 10 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:2495] "1" "2" "3" "4" ...
##   ..$ : chr [1:2495] "1" "2" "3" "4" ...

So graphs really do contain a lot of structural information, rendering most IDW cases superfluous.

Spatially structured random effects

The hglm hierarchical generalized linear model package uses the "CsparseMatrix" version of spatial weights for fitting "SAR" or "CAR" spatially structured random effects

The CARBayes package appears to take dense matrices using listw2mat() or nb2mat(), preferring binary matrices

The "mrf" random effect in the mgcv package for use with gam() takes an "nb" object directly

The R2BayesX::nb2gra() function converts an "nb" neighbour object into a graph for use with BayesX through R2BayesX, but creates a dense matrix; the graph is passed as the map= argument to the "mrf" structured random effect

gra <- R2BayesX::nb2gra(ncCR85)
str(gra)

##  'gra' num [1:100, 1:100] 3 -1 0 0 0 0 0 0 0 0 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:100] "37009" "37005" "37171" "37053" ...
##   ..$ : chr [1:100] "37009" "37005" "37171" "37053" ...

The nb2INLA() function in spdep writes a file that INLA can use through the INLA package, used with the "besag" model, for example:

tf <- tempfile()
nb2INLA(tf, ncCR85)
file.size(tf)

## [1] 1945

What does it mean that this neighbour object not symmetric?

coords <- st_centroid(st_geometry(nc), of_largest_polygon)
(knn_5_nb <- knn2nb(knearneigh(st_coordinates(coords), k=5)))

## Neighbour list object:
## Number of regions: 100 
## Number of nonzero links: 500 
## Percentage nonzero weights: 5 
## Average number of links: 5 
## Non-symmetric neighbours list

klw <- nb2listw(knn_5_nb, style="B")
kW <- as(klw, "CsparseMatrix")
isTRUE(all.equal(kW, t(kW)))

## [1] FALSE

image(kW)

We need to recall that non-symmetric neighbours give directed graphs; if we forget, and treat it as undirected, the extra edges get added (try it):

library(igraph)
g1 <- graph.adjacency(kW, mode="directed")
B1 <- get.adjacency(g1)
mat2listw(B1)$neighbours

## Neighbour list object:
## Number of regions: 100 
## Number of nonzero links: 500 
## Percentage nonzero weights: 5 
## Average number of links: 5 
## Non-symmetric neighbours list

Use igraph functions to explore the ncCR85 "nb" object:

diameter(g1)

## [1] 20

nc_sps <- shortest.paths(g1)
mr <- which.max(apply(nc_sps, 2, max))
nc$sps1 <- nc_sps[,mr]
plot(nc[,"sps1"], breaks=0:21)

plot(nc$sps1, c(st_distance(coords[mr], coords))/1000, xlab="shortest path count", ylab="km distance")

Discuss whether measured distance is really needed to express proximity; the graph shortest paths look fine.

Make objects from the imported neighbours for BayesX and INLA, and as a sparse and dense matrix:

ncCR85a <- ncCR85
attr(ncCR85a, "region.id") <- as.character(nc$CRESS_ID)
nc_gra <- R2BayesX::nb2gra(ncCR85a)
nc_tf <- tempfile()
nb2INLA(nc_tf, ncCR85)
nc_lw <- nb2listw(ncCR85, style="B")
nc_W <- as(nc_lw, "CsparseMatrix")
nc_mat <- listw2mat(nc_lw)

Let’s start by seeing whether there is any spatial autocorrelation in the SIDS rate (modified Freeman-Tukey square root transformation):

nc$ft.SID74 <- sqrt(1000)*(sqrt(nc$SID74/nc$BIR74) + sqrt((nc$SID74+1)/nc$BIR74))
nc$ft.NWBIR74 <- sqrt(1000)*(sqrt(nc$NWBIR74/nc$BIR74) + sqrt((nc$NWBIR74+1)/nc$BIR74))
tm_shape(nc) + tm_fill(c("ft.SID74", "ft.NWBIR74"))

plot(ft.SID74 ~ ft.NWBIR74, nc)

First Moran’s \(I\) with the intercept as

moran.test(nc$ft.SID74, nc_lw, alternative="two.sided", randomisation=FALSE)

## 
##  Moran I test under normality
## 
## data:  nc$ft.SID74  
## weights: nc_lw    
## 
## Moran I statistic standard deviate = 3.5842, p-value = 0.0003381
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.211279611      -0.010101010       0.003814926

Now for a mean model accommodating case weights:

lm.morantest(lm(ft.SID74 ~ 1, weights=BIR74, data=nc), nc_lw, alternative="two.sided")

## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = ft.SID74 ~ 1, data = nc, weights = BIR74)
## weights: nc_lw
## 
## Moran I statistic standard deviate = 4.5965, p-value = 4.297e-06
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I      Expectation         Variance 
##      0.273332891     -0.009631207      0.003789801

Next just with the original covariate (transformed non-white births as a proportion of all births):

lm.morantest(lm(ft.SID74 ~ ft.NWBIR74, data=nc), nc_lw, alternative="two.sided")

## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = ft.SID74 ~ ft.NWBIR74, data = nc)
## weights: nc_lw
## 
## Moran I statistic standard deviate = 1.2148, p-value = 0.2245
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I      Expectation         Variance 
##      0.057042251     -0.016844710      0.003699627

Finally with the covariate and case weights:

lm.morantest(lm(ft.SID74 ~ ft.NWBIR74, weights=BIR74, data=nc), nc_lw, alternative="two.sided")

## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = ft.SID74 ~ ft.NWBIR74, data = nc, weights = BIR74)
## weights: nc_lw
## 
## Moran I statistic standard deviate = 1.7839, p-value = 0.07443
## alternative hypothesis: two.sided
## sample estimates:
## Observed Moran I      Expectation         Variance 
##      0.091350263     -0.016506328      0.003655419

So something spatial appears to be present, but how to model it?

Baddeley, Adrian, Ege Rubak, and Rolf Turner. 2015. Spatial Point Patterns: Methodology and Applications with R. Boca Raton, FL: Chapman and Hall/CRC.

Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2019. Geocomputation with R. Boca Raton, FL: Chapman and Hall/CRC. https://geocompr.robinlovelace.net/.

Müller, Werner G. 2007. Collecting Spatial Data: Optimum Design of Experiments for Random Fields. Berlin: Springer-Verlag.

Ripley, B. D. 1981. Spatial Statistics. New York: Wiley.

Schratz, Patrick, Jannes Muenchow, Eugenia Iturritxa, Jakob Richter, and Alexander Brenning. 2019. “Hyperparameter Tuning and Performance Assessment of Statistical and Machine-Learning Algorithms Using Spatial Data.” Ecological Modelling 406: 109–20. https://doi.org/https://doi.org/10.1016/j.ecolmodel.2019.06.002.

Wall, M. M. 2004. “A Close Look at the Spatial Structure Implied by the CAR and SAR Models.” Journal of Statistical Planning and Inference 121: 311–24.

ECS530: (VI) Spatial autocorrelation

Roger Bivand

Wednesday 4 December 2019, 09:15-11.00, aud. C

Copyright

Required current contributed CRAN packages:

Script

Schedule

Session VI

Is spatial autocorrelation just poor data collection design and/or model mis-specification?

Spatial modelling: why?

Spatial modelling: which kinds?

When is a “spatial” process actually induced by the analyst

Spatial point processes

Adding a `y` trend variable

Variogram model

Spatial autocorrelation

Graph/weight-based neighbours; spatially structured random effects

Graph/weight-based neighbours

Polish presidential election 2015 data set

Contiguous, proximate neighbours

Spatially structured random effects

ECS530: (VI) Spatial autocorrelation

Roger Bivand

Wednesday 4 December 2019, 09:15-11.00, aud. C

Copyright

Required current contributed CRAN packages:

Script

Schedule

Session VI

Is spatial autocorrelation just poor data collection design and/or model mis-specification?

Spatial modelling: why?

Spatial modelling: which kinds?

When is a “spatial” process actually induced by the analyst

Spatial point processes

Adding a y trend variable

Variogram model

Spatial autocorrelation

Graph/weight-based neighbours; spatially structured random effects

Graph/weight-based neighbours

Polish presidential election 2015 data set

Contiguous, proximate neighbours

Spatially structured random effects

Adding a `y` trend variable