Copyright

All the material presented here, to the extent it is original, is available under CC-BY-SA.

Required current contributed CRAN packages:

I am running R 4.1.0, with recent update.packages().

needed <- c("coda", "INLABMA", "INLA", "foreach", "sphet", "ggplot2",
"R2BayesX", "colorspace", "BayesXsrc", "HSAR", "hglm", "hglm.data",
"sp", "MatrixModels", "lme4", "spmoran", "spatialreg", "spData", "tmap", "sf")

Script

Script and data at https://github.com/rsbivand/UAM21_II/raw/main/UAM21_II_210615.zip. Download to suitable location, unzip and use as basis.

Seminar introduction

Schedule

Time	Topic
Monday 7/6
09.00-12.00	How may we integrate spatial data from different sources? How may we aggregate spatial data? Which data structures are helpful in handling spatial data?
13.00-16.00	Observations of spatial data are related to each other either by distance, or by a graph of edges linking observations seen as being neighbours. How may they be constructed? How may we address the issues raised by the probable presence of spatial autocorrelation in the spatial data that we are using?
Tuesday 8/6
09.00-12.00	How can we measure global spatial autocorrelation?
13.00-16.00	How can we measure local spatial autocorrelation?
Monday 14/6
09.00-12.00	How may we fit regression models to spatial data in the presence of spatial autocorrelation? Maximum likelihood and spatial filtering, case weights.
13.00-16.00	How should we interpret the coefficients or impacts of spatial regression models? How may we predict from spatial regession models?
Tuesday 15/6
09.00-12.00	Multi-level models: at which level in the data may we fit spatial processes?
13.00-16.00	Spatial filtering, hierarchical GLM, GAM, etc., spatial epidemiological approaches
Monday 21/6
09.00-12.00	Presentations/consultations/discussion
13.00-16.00	Presentations/consultations/discussion

Spatial filtering

Spatial filtering methods as developed by Daniel A. Griffith (2003) build on using standard linear and generalized linear models supplemented with selected eigenvectors from the spatial weights matrix. In Patuelli et al. (2012), D. A. Griffith (2010) and D. A. Griffith and Paelinck (2011), examples were given of how standard and non-standard spatial econometric problems could be approached using spatial filtering. Tiefelsdorf and Griffith (2007) proposed that the eigenvectors for inclusion should be selected by their ability to reduce residual autocorrelation rather than to increase model fit. This approach was implemented by Yongwan Chun and Michael Tiefelsdorf in spdep and moved to spatialreg, with two steps, first to select eigenvectors taken from the spatial weights matrix doubly centred using the hat matrix of the actual regression, then using lm to fit the model, effectively removing residual autocorrelation:

Upper NY data

The New York 8 county data set contains population standardized leukaemia cases, with Z as a transformed rate:

library(sf)

## Linking to GEOS 3.10.0dev, GDAL 3.3.0, PROJ 8.0.1

NY8 <- st_read(system.file("shapes/NY8_utm18.shp", package="spData"))

## Reading layer `NY8_utm18' from data source 
##   `/home/rsb/lib/r_libs/spData/shapes/NY8_utm18.shp' using driver `ESRI Shapefile'
## Simple feature collection with 281 features and 17 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 358241.9 ymin: 4649755 xmax: 480393.1 ymax: 4808545
## Projected CRS: WGS 84 / UTM zone 18N

NY8 <- st_buffer(NY8, dist=0)

library(tmap)
tm_shape(NY8) + tm_fill("Z", midpoint=0)

Create a neighbour object:

NY_nb <- spdep::poly2nb(NY8)
NY_lwB <- spdep::nb2listw(NY_nb, style="B")
NY_lwW <- spdep::nb2listw(NY_nb, style="W")

Check how the SAR and CAR models behave, with and without case weights:

library(spatialreg)

## Loading required package: spData

## Loading required package: Matrix

gform <- Z ~ PEXPOSURE + PCTAGE65P + PCTOWNHOME
mod1 <- spautolm(gform, data=NY8, family="CAR", listw=NY_lwB, weights=POP8)
summary(mod1)

## 
## Call: spautolm(formula = gform, data = NY8, listw = NY_lwB, weights = POP8, 
##     family = "CAR")
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -1.500566 -0.274566  0.087494  0.454262  4.257081 
## 
## Coefficients: 
##              Estimate Std. Error z value  Pr(>|z|)
## (Intercept) -0.783775   0.142645 -5.4946 3.916e-08
## PEXPOSURE    0.078887   0.027922  2.8253  0.004724
## PCTAGE65P    3.841221   0.573083  6.7027 2.046e-11
## PCTOWNHOME  -0.392319   0.154995 -2.5312  0.011368
## 
## Lambda: 0.011751 LR test value: 0.11668 p-value: 0.73266 
## Numerical Hessian standard error of lambda: 0.039471 
## 
## Log likelihood: -251.7067 
## ML residual variance (sigma squared): 1105.1, (sigma: 33.243)
## Number of observations: 281 
## Number of parameters estimated: 6 
## AIC: 515.41

This data set is used Waller and Gotway (2004), and in both editions of ASDAR. It is harder to deploy Poisson models because the cases are not integer counts.

(SF0 <- SpatialFiltering(gform, data = NY8, nb=NY_nb, style="W"))

##    Step SelEvec      Eval        MinMi     ZMinMi     Pr(ZI)        R2
## 0     0       0 0.0000000  0.077160258 2.41881146 0.01557131 0.1931605
## 1     1      21 0.7617439  0.065249815 2.17039169 0.02997719 0.2069579
## 2     2      78 0.2132604  0.052273969 1.82565937 0.06790159 0.2708788
## 3     3      12 0.8593013  0.040520587 1.59403969 0.11092715 0.2813452
## 4     4      11 0.8763886  0.029200840 1.37568345 0.16891966 0.2909475
## 5     5      10 0.8936187  0.017791439 1.15578610 0.24776866 0.3001843
## 6     6      36 0.5975208  0.006126374 0.89180778 0.37249597 0.3139880
## 7     7      25 0.7045949 -0.005770779 0.63373494 0.52625382 0.3254772
## 8     8       8 0.9138669 -0.016486915 0.43237073 0.66547199 0.3332466
## 9     9      41 0.5555168 -0.027419742 0.18369582 0.85425209 0.3457514
## 10   10       4 0.9659973 -0.036297199 0.03794093 0.96973478 0.3515462
##         gamma
## 0   0.0000000
## 1  -1.4302287
## 2  -3.0784210
## 3  -1.2456734
## 4  -1.1931510
## 5   1.1702215
## 6  -1.4305515
## 7   1.3051280
## 8  -1.0732479
## 9   1.3615854
## 10  0.9268825

SF_obj2 <- lm(update(gform, . ~ . + fitted(SF0)), data = NY8)
summary(SF_obj2)

## 
## Call:
## lm(formula = update(gform, . ~ . + fitted(SF0)), data = NY8)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.42548 -0.38747 -0.03151  0.32462  3.11533 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -0.51728    0.14478  -3.573 0.000419 ***
## PEXPOSURE         0.04884    0.03202   1.525 0.128366    
## PCTAGE65P         3.95089    0.55290   7.146 8.54e-12 ***
## PCTOWNHOME       -0.56004    0.15551  -3.601 0.000378 ***
## fitted(SF0)vec21 -1.43023    0.60005  -2.383 0.017848 *  
## fitted(SF0)vec78 -3.07842    0.60005  -5.130 5.57e-07 ***
## fitted(SF0)vec12 -1.24567    0.60005  -2.076 0.038857 *  
## fitted(SF0)vec11 -1.19315    0.60005  -1.988 0.047788 *  
## fitted(SF0)vec10  1.17022    0.60005   1.950 0.052200 .  
## fitted(SF0)vec36 -1.43055    0.60005  -2.384 0.017823 *  
## fitted(SF0)vec25  1.30513    0.60005   2.175 0.030507 *  
## fitted(SF0)vec8  -1.07325    0.60005  -1.789 0.074815 .  
## fitted(SF0)vec41  1.36159    0.60005   2.269 0.024060 *  
## fitted(SF0)vec4   0.92688    0.60005   1.545 0.123612    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6001 on 267 degrees of freedom
## Multiple R-squared:  0.3515, Adjusted R-squared:   0.32 
## F-statistic: 11.13 on 13 and 267 DF,  p-value: < 2.2e-16

spdep::lm.morantest(SF_obj2, NY_lwW)

## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = update(gform, . ~ . + fitted(SF0)), data = NY8)
## weights: NY_lwW
## 
## Moran I statistic standard deviate = 0.037941, p-value = 0.4849
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     -0.036297199     -0.037620475      0.001216424

vecs <- fitted(SF0)
pvecs <- vecs %*% diag(coefficients(SF_obj2)[5:14])
colnames(pvecs) <- colnames(vecs)

Typically, the small subset of eigenvectors selected mops up spatial autocorrelation in the residual. S. Dray, Legendre, and Peres-Neto (2006) and D. A. Griffith and Peres-Neto (2006) adopt a similar approach in a generalized linear model context, implemented in spdep by Pedro Peres-Neto and moved now to spatialreg as ME analogous with SpatialFiltering, but centering the spatial weights matrix on the null model hat matrix, and using bootstrap methods in evaluating the the choice of eigenvectors. The correlations between the implied cumulated outcomes of these methods are shown in Table . Stéphane Dray et al. (2012) describe many of the underlying motivations, including the view that Moran eigenvector spatial filtering approaches may permit both spatial autocorrelation and spatial scale tto be accommodate in a single model; a further implementation is given in Stéphane Dray et al. (2021).

set.seed(22)
(ME0 <- ME(gform, data = NY8, listw=NY_lwW, alpha=0.15, nsim=99))

##   Eigenvector ZI pr(ZI)
## 0          NA NA   0.02
## 1          14 NA   0.05
## 2          10 NA   0.09
## 3          13 NA   0.19

ME_obj2 <- lm(update(gform, . ~ . + fitted(ME0)), data = NY8)
summary(ME_obj2)

## 
## Call:
## lm(formula = update(gform, . ~ . + fitted(ME0)), data = NY8)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8380 -0.4017 -0.0284  0.3515  3.9741 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -0.50990    0.16308  -3.127  0.00196 ** 
## PEXPOSURE         0.04796    0.03454   1.389  0.16611    
## PCTAGE65P         3.98900    0.60341   6.611 1.99e-10 ***
## PCTOWNHOME       -0.57883    0.17669  -3.276  0.00119 ** 
## fitted(ME0)vec14  1.68445    0.65969   2.553  0.01121 *  
## fitted(ME0)vec10 -1.24178    0.67254  -1.846  0.06591 .  
## fitted(ME0)vec13 -1.06217    0.64759  -1.640  0.10211    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6454 on 274 degrees of freedom
## Multiple R-squared:  0.2303, Adjusted R-squared:  0.2134 
## F-statistic: 13.66 on 6 and 274 DF,  p-value: 1.435e-13

mvecs <- fitted(ME0) %*% diag(coefficients(ME_obj2)[5:7])
sem_obj2a <- spautolm(gform, NY8, listw=NY_lwW)
W <- as(NY_lwW, "CsparseMatrix")
SAR_ssre <- 2*as.vector((sem_obj2a$lambda * W) %*% sem_obj2a$Y - (sem_obj2a$lambda * W) %*% (sem_obj2a$X %*% sem_obj2a$fit$coefficients))

Murakami and Griffith (2019) provide a fresher version of spatial filtering implemented in Murakami (2021). This also appears to centre the spatial weights matrix on the null model hat matrix, and chooses eigenvectors not to reduce residual autocorrelation, but chooses those among the eigenvectors with positive eigenvalues that increase model fit most up to a threshold to control overfitting. The default approach uses an exponential variogram model to generate the weights matrix from planar coordinates. The meigen function subsets the full set of eigenvectors before the data are seen, then esf calls lm itself while further subsetting the eigenvectors.

library(spmoran)
centroids_utm <- st_centroid(st_geometry(NY8))
meig <- meigen(st_coordinates(centroids_utm))

##  33/281 eigen-pairs are extracted

y <- model.response(model.frame(gform, NY8))
X <- model.matrix(update(gform, ~ .), NY8)
(esf_obj2 <- esf(y, X, meig=meig))

##   13/33 eigenvectors are selected

## Call:
## esf(y = y, x = X, meig = meig)
## 
## ----Coefficients------------------------------
##                 Estimate        SE     t_value      p_value
## (Intercept) -0.645834939 0.1653420 -3.90605508 1.192357e-04
## PEXPOSURE   -0.003338754 0.0393389 -0.08487155 9.324278e-01
## PCTAGE65P    3.815483576 0.6071078  6.28468921 1.350496e-09
## PCTOWNHOME  -0.149187135 0.2024274 -0.73699095 4.617823e-01
## 
## ----Spatial effects (residuals)---------------
##                        Estimate
## SE                   0.25038503
## Moran.I/max(Moran.I) 0.09378257
## 
## ----Error statistics--------------------------
##                  stat
## resid_SE    0.6337987
## adjR2       0.2413396
## logLik   -261.8110562
## AIC       559.6221123
## BIC       625.1124964

esf_obj2$r

##        Estimate        SE   t_value     p_value
## sf4   2.1178467 0.7512534  2.819084 0.005180899
## sf16  1.1685835 0.6394087  1.827600 0.068738267
## sf10  1.5165780 0.6639666  2.284118 0.023158813
## sf7  -0.9797156 0.6452891 -1.518258 0.130145881
## sf3  -1.2546505 0.6715339 -1.868335 0.062822932
## sf23 -0.9807425 0.6421764 -1.527217 0.127904327
## sf17 -0.9867262 0.6344423 -1.555266 0.121080772
## sf30 -0.9672723 0.6339788 -1.525717 0.128277437
## sf19  1.0235518 0.6477160  1.580248 0.115247446
## sf18  0.9472687 0.6429683  1.473274 0.141868377
## sf13  0.8214887 0.6437379  1.276123 0.203033580
## sf8  -0.8464719 0.6536142 -1.295063 0.196429731
## sf12  0.8134187 0.6831693  1.190655 0.234858689

r_used_eigs <- row.names(esf_obj2$r)
used_eigs <- as.integer(substring(r_used_eigs, 3))
evecs <- meig$sf[, used_eigs] %*% diag(esf_obj2$r[1:length(r_used_eigs),1])
colnames(evecs) <- r_used_eigs

SEs <- cbind("ESF"=apply(evecs, 1, sum), "SF"=apply(pvecs, 1, sum), "ME"=apply(mvecs, 1, sum), "SAR"=SAR_ssre)
NY8_SEs <- cbind(NY8, SEs)
mat <- cor(SEs)
colnames(mat) <- rownames(mat) <- c("ESF", "SF", "ME", "SAR")
mat

##           ESF        SF        ME       SAR
## ESF 1.0000000 0.3452012 0.4634257 0.6032419
## SF  0.3452012 1.0000000 0.4639506 0.5144771
## ME  0.4634257 0.4639506 1.0000000 0.3907094
## SAR 0.6032419 0.5144771 0.3907094 1.0000000

The correlations between the two estimates of spatially structured random effects, three cumulated spatial filtering approaches, and the spatially structured term implied by the ML estimates of the spatial error model. As can be seen, they are very similar to each other, so the choice of approach may be fairly flexible and relate more to the needs of users and their domain usages that to a single body of theory.

tm_shape(NY8_SEs) + tm_fill(c("ESF", "SF", "ME", "SAR"), n=8, palette="BrBG", midpoint=0, title="Spatial effect") + tm_facets(free.scales=FALSE) + tm_borders(lwd=0.5) + tm_layout(panel.labels=c("ESF", "SF", "ME", "SAR"), bg="grey90")

NC SIDS data

nc <- st_read(system.file("shapes/sids.shp", package="spData")[1], quiet=TRUE)
st_crs(nc) <- "EPSG:4267"
row.names(nc) <- as.character(nc$FIPSNO)
nc$ft.SID74 <- sqrt(1000)*(sqrt(nc$SID74/nc$BIR74) + sqrt((nc$SID74+1)/nc$BIR74))
nc$ft.NWBIR74 <- sqrt(1000)*(sqrt(nc$NWBIR74/nc$BIR74) + sqrt((nc$NWBIR74+1)/nc$BIR74))
gal_file <- system.file("weights/ncCR85.gal", package="spData")[1]
ncCR85 <- spdep::read.gal(gal_file, region.id=nc$FIPSNO)
nc_lw <- spdep::nb2listw(ncCR85, style="B")
nc_W <- as(nc_lw, "CsparseMatrix")

E <- nc$BIR74 * sum(nc$SID74)/sum(nc$BIR74)
set.seed(1001)
(ME_nc_p <- ME(SID74 ~ ft.NWBIR74, offset=log(E), weights=BIR74, data=nc, family=poisson(link=log), listw=nc_lw, alpha=0.25, nsim=499))

##   Eigenvector ZI pr(ZI)
## 0          NA NA  0.236
## 1          29 NA  0.376

ME_obj_nc0 <- glm(SID74 ~ ft.NWBIR74, offset=log(E), weights=BIR74, data=nc, family=poisson(link=log))
summary(ME_obj_nc0)

## 
## Call:
## glm(formula = SID74 ~ ft.NWBIR74, family = poisson(link = log), 
##     data = nc, weights = BIR74, offset = log(E))
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -313.610   -24.400     5.395    44.517   300.187  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.170e+00  2.524e-03  -463.5   <2e-16 ***
## ft.NWBIR74   2.988e-02  6.708e-05   445.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 789476  on 99  degrees of freedom
## Residual deviance: 588642  on 98  degrees of freedom
## AIC: 1934183
## 
## Number of Fisher Scoring iterations: 4

ME_obj_nc <- glm(update(SID74 ~ ft.NWBIR74, . ~ . + fitted(ME_nc_p)), offset=log(E), weights=BIR74, data=nc, family=poisson(link=log))
summary(ME_obj_nc)

## 
## Call:
## glm(formula = update(SID74 ~ ft.NWBIR74, . ~ . + fitted(ME_nc_p)), 
##     family = poisson(link = log), data = nc, weights = BIR74, 
##     offset = log(E))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -336.68   -27.88     2.30    37.86   316.34  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -1.026e+00  2.548e-03  -402.7   <2e-16 ***
## ft.NWBIR74       2.738e-02  6.622e-05   413.5   <2e-16 ***
## fitted(ME_nc_p) -9.998e-01  5.081e-03  -196.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 789476  on 99  degrees of freedom
## Residual deviance: 550383  on 97  degrees of freedom
## AIC: 1895926
## 
## Number of Fisher Scoring iterations: 4

anova(ME_obj_nc0, ME_obj_nc, test="Chisq")

## Analysis of Deviance Table
## 
## Model 1: SID74 ~ ft.NWBIR74
## Model 2: SID74 ~ ft.NWBIR74 + fitted(ME_nc_p)
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1        98     588642                          
## 2        97     550383  1    38259 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Markov random field and multilevel models with spatial weights

Boston data set

boston_506 <- st_read(system.file("shapes/boston_tracts.shp", package="spData")[1])

## Reading layer `boston_tracts' from data source 
##   `/home/rsb/lib/r_libs/spData/shapes/boston_tracts.shp' using driver `ESRI Shapefile'
## Simple feature collection with 506 features and 36 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -71.52311 ymin: 42.00305 xmax: -70.63823 ymax: 42.67307
## Geodetic CRS:  NAD27

nb_q <- spdep::poly2nb(boston_506)
lw_q <- spdep::nb2listw(nb_q, style="W")

boston_506$CHAS <- as.factor(boston_506$CHAS)
boston_489 <- boston_506[!is.na(boston_506$median),]
nb_q_489 <- spdep::poly2nb(boston_489)
lw_q_489 <- spdep::nb2listw(nb_q_489, style="W", zero.policy=TRUE)

agg_96 <- list(as.character(boston_506$NOX_ID))
boston_96 <- aggregate(boston_506[, "NOX_ID"], by=agg_96, unique)
nb_q_96 <- spdep::poly2nb(boston_96)
lw_q_96 <- spdep::nb2listw(nb_q_96)
boston_96$NOX <- aggregate(boston_506$NOX, agg_96, mean)$x
boston_96$CHAS <- aggregate(as.integer(boston_506$CHAS)-1, agg_96, max)$x

nms <- names(boston_506)
ccounts <- 23:31
for (nm in nms[c(22, ccounts, 36)]) {
  boston_96[[nm]] <- aggregate(boston_506[[nm]], agg_96, sum)$x
}
br2 <- c(3.50,  6.25,  8.75, 12.50, 17.50, 22.50, 30.00, 42.50, 60.00)*1000
counts <- as.data.frame(boston_96)[, nms[ccounts]]
f <- function(x) matrixStats::weightedMedian(x=br2, w=x, interpolate=TRUE)
boston_96$median <- apply(counts, 1, f)
is.na(boston_96$median) <- boston_96$median > 50000
summary(boston_96$median)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    9009   20417   23523   25263   30073   49496       2

boston_94 <- boston_96[!is.na(boston_96$median),]
nb_q_94 <- spdep::subset.nb(nb_q_96, !is.na(boston_96$median))
lw_q_94 <- spdep::nb2listw(nb_q_94, style="W")

There is a large literature in spatial epidemiology using CAR and ICAR models in spatially structured random effects. These extend to multilevel models, in which the spatially structured random effects may apply at different levels of the model (R. S. Bivand et al. 2017). In order to try out some of the variants, we need to remove the no-neighbour observations from the tract level, and from the model output zone aggregated level, in two steps as reducing the tract level induces a no-neighbour outcome at the model output zone level.

boston_94a <- aggregate(boston_489[,"NOX_ID"], list(boston_489$NOX_ID), unique)
nb_q_94a <- spdep::poly2nb(boston_94a)
NOX_ID_no_neighs <- boston_94a$NOX_ID[which(spdep::card(nb_q_94a) == 0)]
boston_487 <- boston_489[is.na(match(boston_489$NOX_ID, NOX_ID_no_neighs)),]
boston_93 <- aggregate(boston_487[, "NOX_ID"], list(ids = boston_487$NOX_ID), unique)
row.names(boston_93) <- as.character(boston_93$NOX_ID)
nb_q_93 <- spdep::poly2nb(boston_93, row.names=unique(as.character(boston_93$NOX_ID)))

The lme4 package lets us add an IID unstructured random effect at the model output zone level:

form <- formula(log(median) ~ CRIM + ZN + INDUS + CHAS + I((NOX*10)^2) + I(RM^2) + 
                  AGE + log(DIS) + log(RAD) + TAX + PTRATIO + I(BB/100) + 
                  log(I(LSTAT/100)))

The ZN, INDUS, NOX, RAD, TAX and PTRATIO variables show effectively no variability within the TASSIM zones, so in a multilevel model the random effect may absorb their influence.

library(lme4)
MLM <- lmer(update(form, . ~ . + (1 | NOX_ID)), data=boston_487, REML=FALSE)

copying the random effect into the "sf" object for mapping below.

boston_93$MLM_re <- ranef(MLM)[[1]][,1]

Two packages, hglm and HSAR, offer SAR upper level spatially structured random effects, and require the specification of a sparse matrix mapping the upper level enities onto lower level entities, and sparse binary weights matrices:

library(Matrix)
suppressMessages(library(MatrixModels))
Delta <- as(model.Matrix(~ -1 + as.factor(NOX_ID), data=boston_487, sparse=TRUE),
            "dgCMatrix")
M <- as(spdep::nb2listw(nb_q_93, style="B"), "CsparseMatrix")

The extension of hglm to sparse spatial setting extended its facilities (Alam, Rönnegård, and Shen 2015), and also permits the modelling of discrete responses. First we fit an IID random effect:

suppressPackageStartupMessages(library(hglm))
y_hglm <- log(boston_487$median)
X_hglm <- model.matrix(lm(form, data=boston_487))
suppressWarnings(HGLM_iid <- hglm(y=y_hglm, X=X_hglm, Z=Delta))

followed by a SAR model at the upper level (corresponding to a spatial error (SEM) model), which reports the spatially structured random effect without fully converging, so coefficient standard errors are not available:

suppressWarnings(HGLM_sar <- hglm(y=y_hglm, X=X_hglm, Z=Delta, rand.family=SAR(D=M)))
boston_93$HGLM_re <- unname(HGLM_iid$ranef)
boston_93$HGLM_ss <- HGLM_sar$ranef[,1]

The HSAR package is restricted to the Gaussian response case, and fits an upper level SEM using MCMC; if W= is a lower level weights matrix, it will also fit a lower level SLM (Guanpeng Dong and Harris 2015; G. Dong et al. 2015):

library(HSAR)
suppressWarnings(HSAR <- hsar(form, data=boston_487, W=NULL, M=M, Delta=Delta, 
                              burnin=500, Nsim=2500, thinning=1))
boston_93$HSAR_ss <- HSAR$Mus[1,]

The R2BayesX package provides flexible support for structured additive regression models, including spatial multilevel models. The models include an IID unstructured random effect at the upper level using the "re" specification (Umlauf et al. 2015); we choose the "MCMC"method:

suppressPackageStartupMessages(library(R2BayesX))
BX_iid <- bayesx(update(form, . ~ . + sx(NOX_ID, bs="re")), family="gaussian",
data=boston_487, method="MCMC", iterations=12000, burnin=2000, step=2, seed=123)
boston_93$BX_re <- BX_iid$effects["sx(NOX_ID):re"][[1]]$Mean

and the "mrf" (Markov random field) spatially structured random effect specification based on a graph derived from converting a suitable "nb" object for the upper level. The "region.id" attribute of the "nb" object needs to contain values corresponding the the indexing variable.

RBX_gra <- nb2gra(nb_q_93)
BX_mrf <- bayesx(update(form, . ~ . + sx(NOX_ID, bs="mrf", map=RBX_gra)), 
                 family="gaussian", data=boston_487, method="MCMC", 
                 iterations=12000, burnin=2000,step=2, seed=123)
boston_93$BX_ss <- BX_mrf$effects["sx(NOX_ID):mrf"][[1]]$Mean

In a very similar way, mgcv::gam() can take an "mrf" term using a suitable "nb" object for the upper level. In this case the "nb" object needs to have the contents of the "region.id" attribute copied as the names of the neighbour list components, and the indexing variable needs to be a factor (Wood 2017) (the "REML" method of bayesx() gives the same result here):

library(mgcv)
names(nb_q_93) <- attr(nb_q_93, "region.id")
boston_487$NOX_ID <- as.factor(boston_487$NOX_ID)
GAM_MRF <- gam(update(form, . ~ . + s(NOX_ID, bs="mrf", xt=list(nb=nb_q_93))),
               data=boston_487, method="REML")
boston_93$GAM_ss <- aggregate(predict(GAM_MRF, type="terms", se=FALSE)[,14],
                              list(boston_487$NOX_ID), mean)$x

In the cases of hglm(), bayesx() and gam(), we could also model discrete responses without further major difficulty, and bayesx() and gam() also facilitate the generalization of functional form fitting for included covariates.

res <- rbind(iid_lmer=summary(MLM)$coefficients[6, 1:2],
             iid_hglm=summary(HGLM_iid)$FixCoefMat[6, 1:2], 
             iid_BX=BX_iid$fixed.effects[6, 1:2], 
             sar_hsar=c(HSAR$Mbetas[1, 6], HSAR$SDbetas[1, 6]),
             mrf_BX=BX_mrf$fixed.effects[6, 1:2], 
             mrf_GAM=c(summary(GAM_MRF)$p.coeff[6], summary(GAM_MRF)$se[6]))

Unfortunately, the coefficient estimates for the air pollution variable for these multilevel models are not helpful. All remain negative, but the inclusion of the model output zone level effects, be they IID or spatially structured, suggest that it is hard to disentangle the influence of the scale of observation from that of covariates observed at that scale.

suppressPackageStartupMessages(library(ggplot2))
df_res <- as.data.frame(res)
names(df_res) <- c("mean", "sd")
limits <- aes(ymax = mean + qnorm(0.975)*sd, ymin=mean + qnorm(0.025)*sd)
df_res$model <- row.names(df_res)
p <- ggplot(df_res, aes(y=mean, x=model)) + geom_point() + geom_errorbar(limits) + geom_hline(yintercept = 0, col="#EB811B") + coord_flip()
p + ggtitle("NOX coefficients and error bars") + theme(plot.background = element_rect(fill = "transparent",colour = NA), legend.background = element_rect(colour = NA, fill = "transparent"))

This map shows that the model output zone level IID random effects are very similar across the three model fitting functions reported.

library(tmap)
tm_shape(boston_93) + tm_fill(c("MLM_re", "HGLM_re" , "BX_re"), midpoint=0, title="IID")  + tm_facets(free.scales=FALSE) + tm_borders(lwd=0.3, alpha=0.4) + tm_layout(panel.labels=c("MLM", "HGLM", "BX"))

The spatially structured SAR and MRF random effects (MRF term in the gam() case) are also very similar, with the MRF somewhat less smoothed than the SAR values.

tm_shape(boston_93) + tm_fill(c("HGLM_ss", "HSAR_ss", "BX_ss", "GAM_ss"), midpoint=0, title="SS")  + tm_facets(free.scales=FALSE) + tm_borders(lwd=0.3, alpha=0.4) + tm_layout(panel.labels=c("HGLM SAR", "HSAR SAR", "BX MRF", "GAM MRF"))

Spatially structured random effects

Although there is still a great need for more thorough comparative studies of model fitting functions for spatial regression, there has been much progress over recent years. This is not completed as a proper set of comparisons with this data set - (R. S. Bivand et al. 2017; R. Bivand 2017) contains comparisons for Beijing land parcels and a housing data set for SW Norway.

NC SIDS data

nc <- st_read(system.file("shapes/sids.shp", package="spData")[1], quiet=TRUE)
st_crs(nc) <- "EPSG:4267"
row.names(nc) <- as.character(nc$FIPSNO)
nc$ft.SID74 <- sqrt(1000)*(sqrt(nc$SID74/nc$BIR74) + sqrt((nc$SID74+1)/nc$BIR74))
nc$ft.NWBIR74 <- sqrt(1000)*(sqrt(nc$NWBIR74/nc$BIR74) + sqrt((nc$NWBIR74+1)/nc$BIR74))
gal_file <- system.file("weights/ncCR85.gal", package="spData")[1]
ncCR85 <- spdep::read.gal(gal_file, region.id=nc$FIPSNO)
nc_lw <- spdep::nb2listw(ncCR85, style="B")
nc_W <- as(nc_lw, "CsparseMatrix")

The approach taken by social scientists including economists, and some others has been to approach this through simultaneous autoregressive approaches, where the response is modelled using fixed covariates, and the residual process is modelled by optimising a log likelihood function. The spatialreg package provides spautolm() and errorsarlm():

m1 <- spautolm(ft.SID74 ~ ft.NWBIR74, weights=BIR74, data=nc, listw=nc_lw, family="SAR")

When we include the covariate, the spatial error coefficient contributes little:

summary(m1)

## 
## Call: spautolm(formula = ft.SID74 ~ ft.NWBIR74, data = nc, listw = nc_lw, 
##     weights = BIR74, family = "SAR")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.83149 -0.44934  0.09456  0.56278  2.90208 
## 
## Coefficients: 
##              Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 1.5816687  0.2507427  6.3079 2.828e-10
## ft.NWBIR74  0.0366864  0.0069608  5.2704 1.361e-07
## 
## Lambda: 0.035437 LR test value: 1.787 p-value: 0.18129 
## Numerical Hessian standard error of lambda: 0.026096 
## 
## Log likelihood: -119.6704 
## ML residual variance (sigma squared): 1304.4, (sigma: 36.117)
## Number of observations: 100 
## Number of parameters estimated: 4 
## AIC: 247.34

It is unusual to present maps of the spatially structured random effect in cases where the simultaneous autoregressive approach is used, but it is fully possible using components of the returned model object (the vector is doubled for comparison with the CAR version):

nc$SAR_ssre <- 2*as.vector((m1$lambda * nc_W) %*% m1$Y - (m1$lambda * nc_W) %*% (m1$X %*% m1$fit$coefficients))
library(tmap)
tm_shape(nc) + tm_fill(c("ft.SID74", "SAR_ssre"), midpoint=c(NA, 0))

The other maximum likelihood implementation gives the same results, but provides a Hausman test for shifts in the covariate coefficients between the aspatial and spatial estimates (Pace and LeSage 2008); there is none:

m1a <- errorsarlm(ft.SID74 ~ ft.NWBIR74, weights=BIR74, data=nc, listw=nc_lw)
summary(m1a, Hausman=TRUE)

## 
## Call:errorsarlm(formula = ft.SID74 ~ ft.NWBIR74, data = nc, listw = nc_lw, 
##     weights = BIR74)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.83149 -0.44934  0.09456  0.56278  2.90208 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 1.5816687  0.2507427  6.3079 2.828e-10
## ft.NWBIR74  0.0366864  0.0069608  5.2704 1.361e-07
## 
## Lambda: 0.035437, LR test value: 1.787, p-value: 0.18129
## Asymptotic standard error: 0.029167
##     z-value: 1.2149, p-value: 0.22439
## Wald statistic: 1.4761, p-value: 0.22439
## 
## Log likelihood: -119.6704 for error model
## ML residual variance (sigma squared): 1304.4, (sigma: 36.117)
## Number of observations: 100 
## Number of parameters estimated: 4 
## AIC: 247.34, (AIC for lm: 247.13)
## Hausman test: 1.565, df: 2, p-value: 0.45726

It also lets us add Durbin= terms, that is spatially lagged covariates:

m1b <- errorsarlm(ft.SID74 ~ ft.NWBIR74, weights=BIR74, data=nc, listw=nc_lw, Durbin=TRUE)
summary(m1b, Hausman=TRUE)

## 
## Call:errorsarlm(formula = ft.SID74 ~ ft.NWBIR74, data = nc, listw = nc_lw, 
##     weights = BIR74, Durbin = TRUE)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -1.8705347 -0.4488357  0.0094254  0.4864939  2.7406880 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                    Estimate  Std. Error z value  Pr(>|z|)
## (Intercept)      2.28212888  0.33548214  6.8025 1.028e-11
## ft.NWBIR74       0.03532229  0.00811453  4.3530 1.343e-05
## lag.(Intercept) -0.15336745  0.06073121 -2.5253   0.01156
## lag.ft.NWBIR74   0.00098902  0.00159099  0.6216   0.53418
## 
## Lambda: 0.028053, LR test value: 1.036, p-value: 0.30874
## Asymptotic standard error: 0.029824
##     z-value: 0.94064, p-value: 0.34689
## Wald statistic: 0.8848, p-value: 0.34689
## 
## Log likelihood: -115.1974 for error model
## ML residual variance (sigma squared): 1195.8, (sigma: 34.58)
## Number of observations: 100 
## Number of parameters estimated: 6 
## AIC: 242.39, (AIC for lm: 241.43)
## Hausman test: 1.8315, df: 4, p-value: 0.76671

and to present the by-covariate effects taking into account the unlagged (direct) and lagged (indirect) covariates and the sum of the coefficients (total):

summary(impacts(m1b))

## Impact measures (SDEM, estimable, n):
##                Direct     Indirect      Total
## ft.NWBIR74 0.03532229 0.0009890218 0.03631131
## ========================================================
## Standard errors:
##                 Direct    Indirect       Total
## ft.NWBIR74 0.008114531 0.001590986 0.007277556
## ========================================================
## Z-values:
##              Direct  Indirect    Total
## ft.NWBIR74 4.352968 0.6216407 4.989493
## 
## p-values:
##            Direct     Indirect Total     
## ft.NWBIR74 1.3431e-05 0.53418  6.0538e-07

However, our model may suffer from not using a mixed model approach to a count response; the simultaneous autoregressive models are mostly used with Gaussian responses. One possibility is to employ the hierarchical generalized linear model approach from the hglm package. First we’ll fit an unstructured IID (independent and identically distributed) random effect (Alam, Rönnegård, and Shen 2015):

library(hglm)
E <- nc$BIR74 * sum(nc$SID74)/sum(nc$BIR74)
HGLM_iid <- hglm(fixed=SID74 ~ ft.NWBIR74, random= ~ 1|CRESS_ID, offset=log(E), weights=BIR74,
                 data=nc, family=poisson(link=log))

The random effects also have their own standard errors, so we can order and display them with error bars in a forest or caterpillar plot:

ranef_iid <- unname(summary(HGLM_iid, print.ranef=TRUE)$RandCoefMat)
metafor::forest(ranef_iid[,1], ranef_iid[,2], subset=order(ranef_iid[,1], decreasing=TRUE), 
        slab=NA, annotate=FALSE, lty=c("solid","blank"), pch=19, psize=2, cex.lab=1, cex.axis=1)

We can also fit a weighted Poisson simultaneous autoregressive model, and examine the random effects:

HGLM_sar <- hglm(fixed=SID74 ~ ft.NWBIR74, random= ~ 1|CRESS_ID, offset=log(E), weights=BIR74, 
                 data=nc, family=poisson(link=log), rand.family=SAR(D=nc_W))
ranef_sar <- unname(summary(HGLM_sar, print.ranef=TRUE)$RandCoefMat)
metafor::forest(ranef_sar[,1], ranef_sar[,2], subset=order(ranef_sar[,1], decreasing=TRUE), 
        slab=NA, annotate=FALSE, lty=c("solid","blank"), pch=19, psize=2, cex.lab=1, cex.axis=1)

There is not much difference between the IID and the SAR spatially structured random effects:

nc$HGLM_re <- ranef_iid[,1]
nc$HGLM_ss_SAR <- ranef_sar[,1]
tm_shape(nc) + tm_fill(c("HGLM_re", "HGLM_ss_SAR"), midpoint=c(0), title="Poisson HGLM RE") +
  tm_facets(free.scales=FALSE) + tm_layout(panel.labels=c("IID", "SAR SSRE"))

Conditional and MRF autoregressive approaches

Most epidemiological applications use conditional autoregressive approaches, where some (like spautolm() and the hglm() implementations) fit a spatial coefficient, but many fit an intrinsic CAR. First the spautolm() and the hglm() implementations:

m1c <- spautolm(ft.SID74 ~ ft.NWBIR74, weights=BIR74, data=nc, listw=nc_lw, family="CAR")
summary(m1c)

## 
## Call: spautolm(formula = ft.SID74 ~ ft.NWBIR74, data = nc, listw = nc_lw, 
##     weights = BIR74, family = "CAR")
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -1.890805 -0.483471  0.070376  0.575906  2.824220 
## 
## Coefficients: 
##              Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 1.4561763  0.2499558  5.8257 5.686e-09
## ft.NWBIR74  0.0410281  0.0069417  5.9103 3.414e-09
## 
## Lambda: 0.06028 LR test value: 1.6364 p-value: 0.20082 
## Numerical Hessian standard error of lambda: 0.053566 
## 
## Log likelihood: -119.7458 
## ML residual variance (sigma squared): 1302, (sigma: 36.083)
## Number of observations: 100 
## Number of parameters estimated: 4 
## AIC: 247.49

Again, we can calculate something that represents the spatial patterning of the spatial process (termed “signal” in the documentation of spatialreg::predict.sarlm), but it is not scaled in the same way as the hgml() random effects (and importantly we do not have standard errors):

nc$CAR_ssre <- as.vector((m1c$lambda * nc_W) %*% m1c$Y - 
                           (m1c$lambda * nc_W) %*% (m1c$X %*% m1c$fit$coefficients))
tm_shape(nc) + tm_fill(c("SAR_ssre", "CAR_ssre"), midpoint=c(0), title="Gauss ML RE") +
  tm_facets(free.scales=FALSE) + tm_layout(panel.labels=c("SAR SSRE", "CAR SSRE"))

Fitting the HGLM CAR model is just like the SAR model, and the forest plot of the spatially structured random effect is similar. Recall that spautolm() is fitting a Gaussian model, but hglm() is fitting a Poisson model, arguably better suited to count data. This means that the scalings of the random effects will vary in scale:

HGLM_car <- hglm(fixed=SID74 ~ ft.NWBIR74, random= ~ 1|CRESS_ID, offset=log(E), weights=BIR74, 
                 data=nc, family=poisson(link=log), rand.family=CAR(D=nc_W))
ranef_car <- unname(summary(HGLM_car, print.ranef=TRUE)$RandCoefMat)
metafor::forest(ranef_car[,1], ranef_car[,2], subset=order(ranef_car[,1], decreasing=TRUE), 
        slab=NA, annotate=FALSE, lty=c("solid","blank"), pch=19, psize=2, cex.lab=1, cex.axis=1)

nc$HGLM_ss_CAR <- ranef_car[,1]
tm_shape(nc) + tm_fill(c("HGLM_ss_CAR", "HGLM_ss_SAR"), midpoint=c(0), title="Poisson HGLM RE") +
  tm_facets(free.scales=FALSE) + tm_layout(panel.labels=c("CAR SSRE", "SAR SSRE"))

To use a generalized additive mixed model (mgcv::gam() with an "mrf" random effect), and some other mixed models, the areal entities need to be grouped (done in the first exercise), and we can try a flexible fit on the covariate:

nc$LM <- as.numeric(interaction(nc$L_id, nc$M_id))
aggLM <- aggregate(nc[,"LM"], list(nc$LM), head, n=1)
aggnb <- spdep::poly2nb(aggLM)
library(mgcv)
names(aggnb) <- as.character(aggLM$Group.1)
nc$LM <- as.factor(nc$LM)
GAM_mrf <- gam(SID74 ~ s(ft.NWBIR74) + s(LM, bs="mrf", xt=list(nb=aggnb)), offset=log(E), weights=BIR74, data=nc, family=poisson(link=log))
summary(GAM_mrf)

And plot the covariate smooth term:

library(mgcv)
plot(GAM_mrf)

The forest plot is obviously grouped too:

GAM_mrf_re <- predict(GAM_mrf, type="terms", se=TRUE)
metafor::forest(GAM_mrf_re$fit[,2], GAM_mrf_re$se.fit[,2], subset=order(GAM_mrf_re$fit[,1], decreasing=TRUE), 
        slab=NA, annotate=FALSE, lty=c("solid","blank"), pch=19, psize=2, cex.lab=1, cex.axis=1)

as is the RE map:

nc$GAM_mrf_re <- GAM_mrf_re$fit[,2]
tm_shape(nc) + tm_fill(c("GAM_mrf_re"), midpoint=c(0), title="Poisson GAM MRF RE")

US Census ACS dataset

df_tracts <- st_read("df_tracts.gpkg")

## Reading layer `df_tracts' from data source 
##   `/home/rsb/und/uam21/units_2/df_tracts.gpkg' using driver `GPKG'
## Simple feature collection with 71353 features and 28 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -124.7332 ymin: 24.5447 xmax: -66.9499 ymax: 49.38436
## Geodetic CRS:  NAD83

tr_form <- log(med_inc_cv) ~ log1p(vacancy_rate) + log1p(old_rate) + log1p(black_rate) + log1p(hisp_rate) + log1p(group_pop) + log1p(dens)

df_tracts$ST <- factor(substring(df_tracts$GEOID, 1, 2))
library(mgcv)
GAM_RE <- gam(update(tr_form, . ~ . + s(ST, bs="re")), data=df_tracts)
summary(GAM_RE)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## log(med_inc_cv) ~ log1p(vacancy_rate) + log1p(old_rate) + log1p(black_rate) + 
##     log1p(hisp_rate) + log1p(group_pop) + log1p(dens) + s(ST, 
##     bs = "re")
## 
## Parametric coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -6.410054   0.039241 -163.35   <2e-16 ***
## log1p(vacancy_rate)  0.356864   0.025239   14.14   <2e-16 ***
## log1p(old_rate)      0.426732   0.004054  105.27   <2e-16 ***
## log1p(black_rate)    0.434330   0.012156   35.73   <2e-16 ***
## log1p(hisp_rate)     0.515721   0.014432   35.73   <2e-16 ***
## log1p(group_pop)     0.024723   0.000792   31.21   <2e-16 ***
## log1p(dens)          0.045569   0.002023   22.53   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##         edf Ref.df     F p-value    
## s(ST) 46.79     48 49.96  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.255   Deviance explained = 25.5%
## GCV = 0.19839  Scale est. = 0.19824   n = 71353

states <- aggregate(df_tracts["ST"], list(df_tracts$ST), head, n=1)
nb_st <- spdep::poly2nb(states, row.names=as.character(states$ST))
names(nb_st) <- as.character(states$ST)
GAM_MRF <- gam(update(tr_form, . ~ . + s(ST, bs="mrf", k=49, xt=list(nb=nb_st))), data=df_tracts)
summary(GAM_MRF)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## log(med_inc_cv) ~ log1p(vacancy_rate) + log1p(old_rate) + log1p(black_rate) + 
##     log1p(hisp_rate) + log1p(group_pop) + log1p(dens) + s(ST, 
##     bs = "mrf", k = 49, xt = list(nb = nb_st))
## 
## Parametric coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -6.4018825  0.0356203 -179.73   <2e-16 ***
## log1p(vacancy_rate)  0.3572201  0.0252390   14.15   <2e-16 ***
## log1p(old_rate)      0.4270174  0.0040538  105.34   <2e-16 ***
## log1p(black_rate)    0.4318845  0.0121718   35.48   <2e-16 ***
## log1p(hisp_rate)     0.5151583  0.0144364   35.69   <2e-16 ***
## log1p(group_pop)     0.0247906  0.0007921   31.30   <2e-16 ***
## log1p(dens)          0.0456938  0.0020231   22.59   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##         edf Ref.df    F p-value    
## s(ST) 46.25     48 50.1  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.255   Deviance explained = 25.5%
## GCV = 0.19839  Scale est. = 0.19824   n = 71353

cat(capture.output(print(anova(GAM_RE, GAM_MRF, test="Chisq")))[c(1, 2, 9:11)], sep="\n")

## Analysis of Deviance Table
## 
##   Resid. Df Resid. Dev        Df Deviance Pr(>Chi)  
## 1     71298      14134                              
## 2     71298      14134 -0.081294 -0.11743  0.03742 *

Beijing leased residential land parcels data set (2003-2009)

The Beijing leased residential land parcels data set has been made available in the HSAR package. There are 1117 observed land parcels, grouped into 111 districts. The data are for most of the districts for which parcel price data was available for the 2003-2009 period; some isolated districs were also excluded.

library(HSAR)
data(landprice)
data(landSPDF)

The parcel data as provided are in two objects, a data.frame and a SpatialPointsDataFrame, which need to be merged and ordered by district ID (obs relate to the original parcel IDs):

landprice1 <- st_as_sf(landSPDF)
landprice1 <- merge(landprice1, landprice, by="obs")
landprice2 <- landprice1[order(landprice1$district.id.x),]

Categorical variables, such as the year relating to the price, and the district ID, should be specified as factor objects, to permit the creation of dummies on-the-fly:

landprice2$fyear <- factor(landprice2$year + 2003)
landprice2$f_district.id.x <- factor(landprice2$district.id.x)

Because formula objects can include simple functions, such as log of variables, we revert pre-computed logs:

landprice2$price <- exp(landprice2$lnprice)
landprice2$area <- exp(landprice2$lnarea)
landprice2$Dcbd <- exp(landprice2$lndcbd)
landprice2$Dsubway <- exp(landprice2$dsubway)
landprice2$Dpark <- exp(landprice2$dpark)
landprice2$Dele <- exp(landprice2$dele)

Parcel-level spatial weights

We use the weights specified in (G. Dong et al. 2015), an exponential decay function with a distance threshold of 1.5km; they are stored in a sparse listw object. We need to set zero.policy because seven parcels have no neighbours at this threshold:

dnb1.5 <- spdep::dnearneigh(landprice2, 0, 1500, row.names=row.names(landprice2))
dists <- spdep::nbdists(dnb1.5, st_coordinates(landprice2))
edists <- lapply(dists, function(x) exp((-((x/1000)^2))/(1.5^2)))
ozpo <- spdep::set.ZeroPolicyOption(TRUE)
ozpo1 <- set.ZeroPolicyOption(TRUE)
lw <- spdep::nb2listw(dnb1.5, glist=edists, style="W")

## Warning in spdep::nb2listw(dnb1.5, glist = edists, style = "W"): zero sum
## general weights

W <- as(lw, "CsparseMatrix")
trs <- trW(W, m=50)

Formula objects

For over twenty years, data.frame and formula objects have defined S and R syntax. Use of update methods allows flexibility and re-use; the data= argument points to the object containing the named variables (columns):

form <- log(price) ~ log(area) + log(Dcbd) + log(Dele) + log(Dpark) + log(Dsubway) + 
  crimerate + popden + fyear
OLS <- lm(form, data=landprice2)

Using fitted model objects

There are many standard methods for fitted model objects, especially print and summary methods, which are not shown here. Let’s use Moran’s \(I\) to test for residual spatial autocorrelation (and/or other mis-specifications):

spdep::lm.morantest(OLS, lw)

## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = form, data = landprice2)
## weights: lw
## 
## Moran I statistic standard deviate = 15.1, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     0.1944768641    -0.0054494313     0.0001753028

Same framework with spatial models

It makes sense to use the formula/data framework for functions fitting spatial econometrics models, such as SLX; we need a spatial weights listw object, qualified by accepting parcels without neighbours:

SLX <- lmSLX(form, data=landprice2, listw=lw, zero.policy=TRUE)

SLX residual autocorrelation

The fitted object still inherits from lm, so functions and methods expecting such an object will still work. There is still a lot of spatial autocorrelation in the residuals:

class(SLX)

## [1] "SlX" "lm"

spdep::lm.morantest(SLX, lw)

## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1],
## collapse = "+"))), data = as.data.frame(x), weights = weights)
## weights: lw
## 
## Moran I statistic standard deviate = 14.294, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     0.1768367391    -0.0095750057     0.0001700666

Impacts: local and global

The SLX model has come into focus recently because of its ease of estimation, and because the coefficients may be treated as direct and indirect impacts, with total impacts calculated by linear combination:

imps_SLX <- spatialreg::impacts(SLX)
imps_SLX

## Impact measures (SlX, estimable):
##                    Direct     Indirect        Total
## log(area)    -0.027393865  0.090812258  0.063418393
## log(Dcbd)    -0.586485128  0.362485133 -0.223999995
## log(Dele)     0.017324132 -0.159256358 -0.141932226
## log(Dpark)   -0.091259953 -0.212539298 -0.303799251
## log(Dsubway) -0.176849400 -0.146595779 -0.323445180
## crimerate     0.003700684  0.006267203  0.009967887
## popden        0.003018936  0.038261481  0.041280417
## fyear2004    -0.203603540  0.448210479  0.244606939
## fyear2005    -0.037845735  0.874443554  0.836597820
## fyear2006    -0.117303538  0.212143967  0.094840429
## fyear2007     0.689975245 -0.246715186  0.443260059
## fyear2008     0.465854623 -0.266940282  0.198914342
## fyear2009     2.203749162  0.406971450  2.610720612

Maximum likelihood — error model

The spatial error model (SEM) may be estimated by maximum likelihood (case weights may also be used), here using eigenvalues of the spatial weights to compute the Jacobian (R. S. Bivand, Hauke, and Kossowski 2013). The Hausman test (Pace and LeSage 2008) examines whether the SEM coefficients differ from OLS coefficients:

e <- eigenw(lw)
int <- 1/range(e)
SEM <- errorsarlm(form, data=landprice2, listw=lw, interval=int, control=list(pre_eig=e), zero.policy=TRUE)
Hausman.test(SEM)

## 
##  Spatial Hausman test (asymptotic)
## 
## data:  NULL
## Hausman test = 24.24, df = 14, p-value = 0.04285

GMM estimation of SEM

The sphet package (Piras 2010) provides a wide range of model fitting options, terming the spatial coefficient on the error \(\rho\), rather than \(\lambda\), and the other way round for the lag coefficient on the dependent variable:

library(sphet)

## 
## Attaching package: 'sphet'

## The following object is masked from 'package:spatialreg':
## 
##     impacts

GM_SEM <- spreg(form, data=landprice2, listw=lw, model="error")

ML Durbin SEM

The spatial Durbin error model (SDEM), like SLX, adds the spatially lagged independent variables on the right hand side, and may be estimated by maximum likelihood:

SDEM <- errorsarlm(form, data=landprice2, listw=lw, etype="emixed",
 interval=int, control=list(pre_eig=e), zero.policy=TRUE)
Hausman.test(SDEM)

## 
##  Spatial Hausman test (asymptotic)
## 
## data:  NULL
## Hausman test = 12.872, df = 27, p-value = 0.99

SDEM impacts

SDEM impacts, like those for SLX, are calculated by linear combination:

imps_SDEM <- spatialreg::impacts(SDEM)
imps_SDEM

## Impact measures (SDEM, estimable):
##                    Direct      Indirect        Total
## log(area)    -0.025312192  0.0899318945  0.064619702
## log(Dcbd)    -0.588066018  0.3297094494 -0.258356569
## log(Dele)    -0.002311491 -0.1215961308 -0.123907622
## log(Dpark)   -0.081889383 -0.2187482979 -0.300637681
## log(Dsubway) -0.168100998 -0.1136152579 -0.281716256
## crimerate     0.006368131 -0.0007289118  0.005639219
## popden        0.004766271  0.0394667815  0.044233052
## fyear2004    -0.206852031  0.2244677181  0.017615687
## fyear2005    -0.047901500  0.5532040754  0.505302575
## fyear2006    -0.132401030  0.0290397225 -0.103361308
## fyear2007     0.693289819 -0.4672462567  0.226043562
## fyear2008     0.485618722 -0.1354646846  0.350154038
## fyear2009     2.299169404  0.7608231609  3.059992565

Maximum likelihood spatial lag model

The spatial lag model (SAR) is estimated using lagsarlm by maximum likelihood. There is also an LM test for residual autocorrelation:

SAR <- lagsarlm(form, data=landprice2, listw=lw, type="lag", interval=int, control=list(pre_eig=e), zero.policy=TRUE)
digits <- max(5, .Options$digits - 3)
cat("test value: ", format(signif(SAR$LMtest, digits)), ", p-value: ",
 format.pval((1 - pchisq(SAR$LMtest, 1)), digits), "\n", sep="")

## test value: 58.514, p-value: 2.0206e-14

SAR impacts

The impacts method for SAR (and SDM) fitted model objects uses simple samples from the multivariate distribution of the coefficients and their covariances for inferences, and uses traces of powers of the spatial weights:

spatialreg::impacts(SAR, tr=trs)

## Impact measures (lag, trace):
##                    Direct     Indirect        Total
## log(area)    -0.005066435 -0.002046688 -0.007113123
## log(Dcbd)    -0.167881960 -0.067819272 -0.235701232
## log(Dele)    -0.048279743 -0.019503567 -0.067783310
## log(Dpark)   -0.211249350 -0.085338395 -0.296587745
## log(Dsubway) -0.173562177 -0.070113909 -0.243676085
## crimerate     0.004947226  0.001998531  0.006945756
## popden        0.024700110  0.009978103  0.034678213
## fyear2004    -0.186140380 -0.075195125 -0.261335505
## fyear2005    -0.027768036 -0.011217453 -0.038985489
## fyear2006    -0.088974266 -0.035942933 -0.124917199
## fyear2007     0.540520655  0.218354118  0.758874773
## fyear2008     0.400969047  0.161979458  0.562948504
## fyear2009     2.266803638  0.915720619  3.182524258

GMM estimation of SAR

The sphet package (Piras 2010) provides a number of ways of fitting this model, calling the coefficient \(\lambda\):

library(sphet)
GM_SAR <- spreg(form, data=landprice2, listw=lw, model="lag")

GMM SAR impacts

impacts(GM_SAR, tr=trs)

## Impact measures (lag, trace):
##                    Direct     Indirect        Total
## log(area)    -0.006564801 -0.001187541 -0.007752342
## log(Dcbd)    -0.206668996 -0.037385428 -0.244054424
## log(Dele)    -0.065824134 -0.011907269 -0.077731403
## log(Dpark)   -0.245355267 -0.044383589 -0.289738855
## log(Dsubway) -0.207332309 -0.037505418 -0.244837727
## crimerate     0.006224452  0.001125974  0.007350426
## popden        0.028487010  0.005153163  0.033640173
## fyear2004    -0.175330948 -0.031716526 -0.207047473
## fyear2005    -0.006132623 -0.001109362 -0.007241984
## fyear2006    -0.103596189 -0.018740053 -0.122336241
## fyear2007     0.544118927  0.098428499  0.642547426
## fyear2008     0.397535116  0.071912192  0.469447308
## fyear2009     2.187572906  0.395721425  2.583294332

Spatial lag Durbin model

As with the SLX and SDEM models, the lags of the independent variables are made automatically (this needs revisiting for SLX and SDEM to permit different and possibly multiple matrices to be used):

SDM <- lagsarlm(form, data=landprice2, listw=lw, type="mixed", interval=int, control=list(pre_eig=e), zero.policy=TRUE)
digits <- max(5, .Options$digits - 3)
cat("test value: ", format(signif(SDM$LMtest, digits)), ", p-value: ",
 format.pval((1 - pchisq(SDM$LMtest, 1)), digits), "\n", sep="")

## test value: 0.17127, p-value: 0.67898

Tests between SLX, SDM and SDEM

We can use Likelihood ratio tests between nesting models

cat(capture.output(LR.Sarlm(SAR, SDM))[c(7,8,5)], sep="\n")

## Log likelihood of SAR Log likelihood of SDM 
##             -1334.149             -1296.823 
## Likelihood ratio = -74.652, df = 13, p-value = 1.105e-10

cat(capture.output(LR.Sarlm(SEM, SDM))[c(7,8,5)], sep="\n")

## Log likelihood of SEM Log likelihood of SDM 
##             -1309.654             -1296.823 
## Likelihood ratio = -25.661, df = 13, p-value = 0.01887

cat(capture.output(LR.Sarlm(SEM, SDEM))[c(7,8,5)], sep="\n")

##  Log likelihood of SEM Log likelihood of SDEM 
##              -1309.654              -1296.305 
## Likelihood ratio = -26.697, df = 13, p-value = 0.01369

Tests between SLX, SDM and SDEM

cat(capture.output(LR.Sarlm(SLX, SDM))[c(7,8,5)], sep="\n")

## Log likelihood of SLX Log likelihood of SDM 
##             -1349.730             -1296.823 
## Likelihood ratio = -105.81, df = 1, p-value < 2.2e-16

cat(capture.output(LR.Sarlm(SLX, SDEM))[c(7,8,5)], sep="\n")

##  Log likelihood of SLX Log likelihood of SDEM 
##              -1349.730              -1296.305 
## Likelihood ratio = -106.85, df = 1, p-value < 2.2e-16

SDM impacts

Here we’ll take samples to use later:

set.seed(1)
imps <- spatialreg::impacts(SDM, tr=trs, R=2000)
imps

## Impact measures (mixed, trace):
##                    Direct     Indirect        Total
## log(area)    -0.023812364  0.113019489  0.089207125
## log(Dcbd)    -0.025324844 -0.252607004 -0.277931848
## log(Dele)     0.015649846 -0.158102250 -0.142452404
## log(Dpark)   -0.023813315 -0.298648573 -0.322461888
## log(Dsubway) -0.154160382 -0.156320343 -0.310480725
## crimerate     0.005102878  0.002823557  0.007926435
## popden        0.005447242  0.044886407  0.050333649
## fyear2004    -0.200503511  0.424641128  0.224137618
## fyear2005    -0.040082358  0.882319752  0.842237393
## fyear2006    -0.121239177  0.222404995  0.101165817
## fyear2007     0.708046770 -0.306736303  0.401310467
## fyear2008     0.460894440 -0.268754916  0.192139524
## fyear2009     2.266709634  1.070861985  3.337571619

Impacts by simple linear algebra

Let’s also explore the impacts of the log(area) variable the manual way; the direct impact is the same, the total differs very slightly:

IrW1 <- spdep::invIrW(lw, rho=coef(SDM)[1])
n <- nrow(landprice2)
S_area <- IrW1 %*% ((diag(n) * coef(SDM)[3]) + (coef(SDM)[16] * W))
sum(S_area)/n

## [1] 0.08847212

sum(diag(S_area))/n

## [1] -0.02381236

Total impacts by incremented prediction

If we make a prediction (using GSoC 2015 code by Martin Gubri based on (Goulard, Laurent, and Thomas-Agnan 2017)) from the fitted model, increment area, and make a new prediction, the mean difference is the total impact:

newdata <- landprice2
suppressWarnings(p0 <- predict(SDM, newdata=newdata, listw=lw))
newdata$area <- exp(log(newdata$area)+1)
suppressWarnings(p1 <- predict(SDM, newdata=newdata, listw=lw))
mean(p1-p0)

## [1] 0.08847212

MCMC spatial Durbin fitting

Following GSoC 2011, parts of the Spatial Econometrics toolbox were translated into R, with default 2500 draws and 500 omitted as burnin:

set.seed(1)
BSDM <- spBreg_lag(form, data=landprice2, listw=lw, type="Durbin")

MCMC impacts

As we already have the draws, we do not need to sample again to be able to infer:

impsB <- spatialreg::impacts(BSDM, tr=trs)
impsB

## Impact measures (mixed, trace):
##                    Direct     Indirect        Total
## log(area)    -0.023152974  0.113307132  0.090154158
## log(Dcbd)    -0.032380034 -0.246041656 -0.278421690
## log(Dele)     0.015984059 -0.158467681 -0.142483622
## log(Dpark)   -0.024342363 -0.300567009 -0.324909372
## log(Dsubway) -0.152068596 -0.155639100 -0.307707697
## crimerate     0.005035336  0.002922671  0.007958007
## popden        0.005458656  0.044656804  0.050115460
## fyear2004    -0.200777988  0.425261787  0.224483799
## fyear2005    -0.038432141  0.897514140  0.859081998
## fyear2006    -0.122970442  0.228386762  0.105416320
## fyear2007     0.706575015 -0.303635058  0.402939957
## fyear2008     0.466039811 -0.284447551  0.181592260
## fyear2009     2.265015077  1.070339228  3.335354305

Total impact of log(area)

plot(density(impsB$sres$total[,1]), lty=4, col="orange", main="log(area) total impacts", ylim=c(0, 10))
lines(density(imps$sres$total[,1]), lty=3, col="red")
abline(v=imps$res$total[1], lty=3, col="red")
abline(v=impsB$res$total[1], lty=4, col="orange")
abline(v=mean(p1-p0), lty=2, col="blue")
abline(v=imps_SLX$impacts$total[1], lwd=2, col="green")
curve(dnorm(x, mean=imps_SLX$impacts$total[1], sd=imps_SLX$se$total[1]), col="green", lwd=2, add=TRUE, from=-0.2, to=0.4)
legend("topleft", legend=c("BSDM", "SDM (tr)", "SDM (pred)", "SLX"), lty=c(4,3,2,1), col=c("orange", "red", "blue", "green"), lwd=c(1,1,1,2), bty="n", cex=0.8)

Here we show the sampled distributions from the fitted ML model, the MCMC total impacts (red and orange dotted lines), the blue point total impact from prediction difference, and the point estimate and implied distrubution from the SLX model (in green).

General Nested Model (GNM)

The GNM is a SAC (both lag and error coefficients, aka SARAR) with spatially lagged independent variables. It suffers from the same problems as all SARAR - a tendency for the two coefficients to flip

GNM <- sacsarlm(form, data=landprice2, listw=lw, type="sacmixed", interval1=int, interval2=int,
 control=list(pre_eig1=e, pre_eig2=e), zero.policy=TRUE)

Impacts

SAC and GNS fitted models also need impacts computed:

spatialreg::impacts(GNM, tr=trs)

## Impact measures (sacmixed, trace):
##                    Direct      Indirect        Total
## log(area)    -0.024993063  0.0926036289  0.067610566
## log(Dcbd)    -0.500457644  0.2378651652 -0.262592479
## log(Dele)    -0.001220104 -0.1261306225 -0.127350726
## log(Dpark)   -0.071096460 -0.2332392492 -0.304335709
## log(Dsubway) -0.166025683 -0.1191637251 -0.285189408
## crimerate     0.006327946 -0.0002760188  0.006051927
## popden        0.004718665  0.0403153345  0.045034000
## fyear2004    -0.204446287  0.2477368775  0.043290590
## fyear2005    -0.044550665  0.5929889181  0.548438253
## fyear2006    -0.130205529  0.0431112527 -0.087094276
## fyear2007     0.697252303 -0.4543278450  0.242924458
## fyear2008     0.482625151 -0.1600300739  0.322595077
## fyear2009     2.293721029  0.8036967234  3.097417752

Tests between GNM, SDM and SDEM

cat(capture.output(LR.Sarlm(SDM, GNM))[c(7,8,5)], sep="\n")

## Log likelihood of SDM Log likelihood of GNM 
##             -1296.823             -1296.248 
## Likelihood ratio = -1.1504, df = 1, p-value = 0.2835

cat(capture.output(LR.Sarlm(SDEM, GNM))[c(7,8,5)], sep="\n")

## Log likelihood of SDEM  Log likelihood of GNM 
##              -1296.305              -1296.248 
## Likelihood ratio = -0.1148, df = 1, p-value = 0.7347

Bayesian spatial econometrics

# install INLA 
# install.packages("INLA", repos="https://inla.r-inla-download.org/R/stable")
library(INLA)

## Loading required package: foreach

## Loading required package: parallel

## This is INLA_21.03.21 built 2021-03-20 18:00:18 UTC.
##  - See www.r-inla.org/contact-us for how to get help.
##  - To enable PARDISO sparse library; see inla.pardiso()

## 
## Attaching package: 'INLA'

## The following object is masked _by_ '.GlobalEnv':
## 
##     f

library(INLABMA)

Columbus toy data

data(oldcol, package="spdep")
lw <- spdep::nb2listw(COL.nb, style="W")
ev <- eigenw(similar.listw(lw))
W <- spdep::listw2mat(lw)
sW <- as(lw, "CsparseMatrix")
tr <- trW(sW)
m.form <-  CRIME ~ INC + HOVAL
COL.OLD$LAG_INC <- lag(lw, COL.OLD$INC)
COL.OLD$LAG_HOVAL <- lag(lw, COL.OLD$HOVAL)
COL.OLD$idx <- 1:nrow(COL.OLD)
dm.form <- CRIME ~ INC + HOVAL + LAG_INC + LAG_HOVAL

Simulation framework

titer <- 2000L
nomit <- 1000L
niter <- titer - nomit
thin <- 1L

Check Durbin

coefficients(lmSLX(m.form, data=COL.OLD, listw=lw))

## (Intercept)         INC       HOVAL     lag.INC   lag.HOVAL 
##  75.0287479  -1.1089293  -0.2897283  -1.3709725   0.1917608

coefficients(lm(dm.form, data=COL.OLD))

## (Intercept)         INC       HOVAL     LAG_INC   LAG_HOVAL 
##  75.0287479  -1.1089293  -0.2897283  -1.3709725   0.1917608

Try hglm

library(hglm)
suppressWarnings(sem.hglm <- hglm(fixed=m.form, random= ~ 1 | I(as.factor(idx)),
                data=COL.OLD, rand.family=SAR(D=sW), sparse=TRUE, verbose=FALSE))

HGLM SEM

summary(sem.hglm)

## Call: 
## hglm.formula(rand.family = SAR(D = sW), fixed = m.form, random = ~1 | 
##     I(as.factor(idx)), data = COL.OLD, sparse = TRUE, verbose = FALSE)
## 
## ----------
## MEAN MODEL
## ----------
## 
## Summary of the fixed effects estimates:
## 
##             Estimate Std. Error t-value Pr(>|t|)    
## (Intercept) 62.50869    6.21608  10.056 6.99e-07 ***
## INC         -0.93132    0.33604  -2.771  0.01818 *  
## HOVAL       -0.31695    0.09674  -3.276  0.00738 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Note: P-values are based on 11 degrees of freedom
## 
## Summary of the random effects estimates:
## 
##      Estimate Std. Error
## [1,]  -9.8292    32.5390
## [2,] -12.7254     5.7258
## [3,]  -1.8897     7.0867
## ...
## NOTE: to show all the random effects, use print(summary(hglm.object), print.ranef = TRUE).
## 
## ----------------
## DISPERSION MODEL
## ----------------
## 
## NOTE: h-likelihood estimates through EQL can be biased.
## 
## Dispersion parameter for the mean model:
## [1] 22.30184
## 
## Model estimates for the dispersion term:
## 
## Link = log 
## 
## Effects:
##   Estimate Std. Error 
##     3.1047     0.4182 
## 
## Dispersion = 1 is used in Gamma model on deviances to calculate the standard error(s).
## 
## Dispersion parameter for the random effects:
## [1] 1.119
## 
## Dispersion model for the random effects:
## 
## Link = log
## 
## Effects:
## .|Random1 
##                        Estimate Std. Error
## 1/sqrt(SAR.tau)          0.1120     0.0144
## -SAR.rho/sqrt(SAR.tau)  -0.0813     0.0202
## SAR.tau (estimated spatial variance component): 79.67263 
## SAR.rho (estimated spatial correlation): 0.7254112 
## 
## Dispersion = 1 is used in Gamma model on deviances to calculate the standard error(s).
## 
## EQL estimation converged in 49 iterations.

ML SEM

sem.ml <- errorsarlm(m.form, data = COL.OLD, listw=lw, control=list(pre_eig=ev))
summary(sem.ml)

## 
## Call:
## errorsarlm(formula = m.form, data = COL.OLD, listw = lw, control = list(pre_eig = ev))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -34.81174  -6.44031  -0.72142   7.61476  23.33626 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 59.893219   5.366163 11.1613 < 2.2e-16
## INC         -0.941312   0.330569 -2.8476 0.0044057
## HOVAL       -0.302250   0.090476 -3.3407 0.0008358
## 
## Lambda: 0.56179, LR test value: 7.9935, p-value: 0.0046945
## Asymptotic standard error: 0.13387
##     z-value: 4.1966, p-value: 2.7098e-05
## Wald statistic: 17.611, p-value: 2.7098e-05
## 
## Log likelihood: -183.3805 for error model
## ML residual variance (sigma squared): 95.575, (sigma: 9.7762)
## Number of observations: 49 
## Number of parameters estimated: 5 
## AIC: 376.76, (AIC for lm: 382.75)

SET SEM (griddy Gibbs I)

sem.SET0 <- spBreg_err(m.form, data = COL.OLD, listw=lw,
              control=list(ndraw=titer, nomit=nomit, prior=list(gG_sige=TRUE)))

summary(sem.SET0)

## 
## Iterations = 1001:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                 Mean      SD Naive SE Time-series SE
## (Intercept)  59.9029  7.8101 0.246976       0.246976
## INC          -0.9603  0.4056 0.012825       0.012825
## HOVAL        -0.2986  0.1006 0.003183       0.003331
## lambda        0.5605  0.1612 0.005098       0.004919
## sige        115.2855 26.3540 0.833387       0.833387
## 
## 2. Quantiles for each variable:
## 
##                2.5%     25%      50%      75%    97.5%
## (Intercept) 43.2633 55.8237  60.3718  64.7648  72.7576
## INC         -1.7571 -1.2347  -0.9665  -0.6827  -0.1898
## HOVAL       -0.4905 -0.3662  -0.3005  -0.2312  -0.1091
## lambda       0.2206  0.4635   0.5648   0.6783   0.8450
## sige        75.5521 96.2848 111.2165 129.9818 177.1121

library(coda)
raftery.diag(sem.SET0, r=0.01)

## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95 
##                                                    
##              Burn-in  Total Lower bound  Dependence
##              (M)      (N)   (Nmin)       factor (I)
##  (Intercept) 2        969   937          1.030     
##  INC         2        893   937          0.953     
##  HOVAL       2        893   937          0.953     
##  lambda      3        1085  937          1.160     
##  sige        2        893   937          0.953

geweke.diag(sem.SET0)

## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## (Intercept)         INC       HOVAL      lambda        sige 
##     -0.2797     -0.4941      1.0452      0.6519     -0.3323

SET SEM (griddy Gibbs II)

sem.SET1 <- spBreg_err(m.form, data = COL.OLD, listw=lw,
              control=list(ndraw=titer, nomit=nomit, prior=list(gG_sige=FALSE)))

summary(sem.SET1)

## 
## Iterations = 1001:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                 Mean       SD Naive SE Time-series SE
## (Intercept)  59.6538  7.90206 0.249885       0.249885
## INC          -0.9532  0.41157 0.013015       0.013015
## HOVAL        -0.3043  0.09713 0.003071       0.003071
## lambda        0.5632  0.15631 0.004943       0.004943
## sige        115.8606 27.20206 0.860205       0.905463
## 
## 2. Quantiles for each variable:
## 
##                2.5%     25%      50%      75%    97.5%
## (Intercept) 45.3150 55.2312  59.7313  64.4779  74.1728
## INC         -1.7881 -1.2171  -0.9587  -0.6703  -0.1757
## HOVAL       -0.4848 -0.3713  -0.3086  -0.2372  -0.1206
## lambda       0.2116  0.4697   0.5778   0.6638   0.8242
## sige        75.1591 96.7529 111.2610 130.0492 182.6293

raftery.diag(sem.SET1, r=0.01)

## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95 
##                                                    
##              Burn-in  Total Lower bound  Dependence
##              (M)      (N)   (Nmin)       factor (I)
##  (Intercept) 2        893   937          0.953     
##  INC         3        1053  937          1.120     
##  HOVAL       2        969   937          1.030     
##  lambda      2        893   937          0.953     
##  sige        2        893   937          0.953

geweke.diag(sem.SET1)

## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## (Intercept)         INC       HOVAL      lambda        sige 
##     -0.4078      0.4890     -0.4346      1.1084      0.2870

SET SEM (MH)

sem.SET2 <- spBreg_err(m.form, data = COL.OLD, listw=lw,
              control=list(ndraw=titer, nomit=nomit, prior=list(lambdaMH=TRUE)))

summary(sem.SET2)

## 
## Iterations = 1001:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                 Mean       SD Naive SE Time-series SE
## (Intercept)  59.5043  7.65026 0.241922       0.303818
## INC          -0.9236  0.38659 0.012225       0.017650
## HOVAL        -0.2978  0.09687 0.003063       0.003063
## lambda        0.6117  0.15043 0.004757       0.013109
## sige        116.4182 26.03271 0.823226       0.969403
## 
## 2. Quantiles for each variable:
## 
##                2.5%     25%      50%      75%    97.5%
## (Intercept) 44.0357 54.7416  59.6063  64.3279  72.9460
## INC         -1.6562 -1.1765  -0.9255  -0.6643  -0.1867
## HOVAL       -0.4915 -0.3636  -0.3001  -0.2315  -0.1209
## lambda       0.2747  0.5170   0.6214   0.7162   0.8613
## sige        77.1463 98.2394 113.3085 130.2738 177.5965

raftery.diag(sem.SET2, r=0.01)

## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95 
##                                                    
##              Burn-in  Total Lower bound  Dependence
##              (M)      (N)   (Nmin)       factor (I)
##  (Intercept) 2        969   937          1.030     
##  INC         3        1053  937          1.120     
##  HOVAL       2        893   937          0.953     
##  lambda      19       5210  937          5.560     
##  sige        2        969   937          1.030

geweke.diag(sem.SET2)

## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## (Intercept)         INC       HOVAL      lambda        sige 
##     -1.1270      0.9452     -0.1921      0.8126      0.1992

INLA setup

n <- nrow(COL.OLD)
rho.max <- 1/max(Re(ev))
rho.min <- 1/min(Re(ev))
rho <- mean(c(rho.min, rho.max))

INLA_slm SEM setup

zero.variance = list(prec=list(initial = 25, fixed=TRUE))
args.slm <- list(rho.min = rho.min, rho.max = rho.max, W = sW,
              X = matrix(0, nrow=n, ncol=0), Q.beta = matrix(1,0,0))
hyper.slm <- list(prec = list(prior = "loggamma", param = c(0.01, 0.01)),
              rho = list(initial=rho, prior = "logitbeta", param = c(1,1)))

INLA_slm SEM

system.time(sem.inla_slm <- inla(update(m.form, . ~ . + f(idx, model="slm",
                  args.slm=args.slm, hyper=hyper.slm)), data = COL.OLD,
                  family="gaussian", control.family = list(hyper=zero.variance),
                  control.compute=list(dic=TRUE, cpo=TRUE)))

## Warning in inla.model.properties.generic(inla.trim.family(model), mm[names(mm) == : Model 'slm' in section 'latent' is marked as 'experimental'; changes may appear at any time.
##   Use this model with extra care!!! Further warnings are disabled.

print(summary(sem.inla_slm))

## 
## Call:
##    c("inla(formula = update(m.form, . ~ . + f(idx, model = \"slm\", 
##    args.slm = args.slm, ", " hyper = hyper.slm)), family = \"gaussian\", 
##    data = COL.OLD, ", " control.compute = list(dic = TRUE, cpo = TRUE), 
##    control.family = list(hyper = zero.variance))" ) 
## Time used:
##     Pre = 0.381, Running = 1.76, Post = 0.0564, Total = 2.2 
## Fixed effects:
##               mean    sd 0.025quant 0.5quant 0.975quant   mode kld
## (Intercept) 59.728 6.590     46.094   59.951     72.081 60.367   0
## INC         -0.937 0.383     -1.704   -0.933     -0.194 -0.923   0
## HOVAL       -0.301 0.095     -0.489   -0.301     -0.114 -0.302   0
## 
## Random effects:
##   Name     Model
##     idx SLM model
## 
## Model hyperparameters:
##                    mean    sd 0.025quant 0.5quant 0.975quant  mode
## Precision for idx 0.007 0.001      0.006    0.007      0.009 0.007
## Rho for idx       0.857 0.024      0.804    0.859      0.897 0.865
## 
## Expected number of effective parameters(stdev): 49.00(0.00)
## Number of equivalent replicates : 1.00 
## 
## Deviance Information Criterion (DIC) ...............: -1036.71
## Deviance Information Criterion (DIC, saturated) ....: 2405.91
## Effective number of parameters .....................: 48.91
## 
## Marginal log-Likelihood:  -201.63 
## CPO and PIT are computed
## 
## Posterior marginals for the linear predictor and
##  the fitted values are computed

ff<- function(z){z*(rho.max-rho.min)+rho.min}
semmarg <- inla.tmarginal(ff, sem.inla_slm$marginals.hyperpar[[2]]) 
c(mean=mean(semmarg[, "x"]), se=sd(semmarg[, "x"]))

##      mean        se 
## 0.6370825 0.0604782

SEM coefficients and SEs

sem_res

##             sem_ml_coefs sem_hglm_coefs sem_SET0_coefs sem_SET1_coefs
## (Intercept)  59.89321925    62.50869055     59.9028705    59.65378219
##               5.36616252     6.21608063      7.8100691     7.90205803
## INC          -0.94131196    -0.93132065     -0.9603383    -0.95319851
##               0.33056857     0.33603590      0.4055770     0.41156824
## HOVAL        -0.30225021    -0.31694784     -0.2985639    -0.30430968
##               0.09047605     0.09674369      0.1006416     0.09712605
## lambda        0.56179027     0.72541120      0.5604982     0.56322161
##               0.13386868             NA      0.1612004     0.15630614
## s2           95.57450122             NA    115.2854888   115.86057419
##                       NA             NA     26.3540030    27.20206056
##             sem_SET2_coefs sem_inla_slm_coefs
## (Intercept)    59.50429031         59.7280000
##                 7.65025719          6.5900000
## INC            -0.92357879         -0.9370000
##                 0.38659093          0.3830000
## HOVAL          -0.29777749         -0.3010000
##                 0.09687372          0.0950000
## lambda          0.61168224          0.6370825
##                 0.15043207          0.0604782
## s2            116.41819083                 NA
##                26.03270594                 NA

SDEM ML

sdem.ml <- errorsarlm(m.form, data = COL.OLD, listw=lw, Durbin=TRUE, 
            control=list(pre_eig=ev))
summary(sdem.ml)

## 
## Call:errorsarlm(formula = m.form, data = COL.OLD, listw = lw, Durbin = TRUE, 
##     control = list(pre_eig = ev))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -37.31635  -6.54376  -0.22212   6.44591  23.15801 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 73.545133   8.783543  8.3731 < 2.2e-16
## INC         -1.051673   0.319514 -3.2915 0.0009966
## HOVAL       -0.275608   0.091151 -3.0236 0.0024976
## lag.INC     -1.156711   0.578629 -1.9991 0.0456024
## lag.HOVAL    0.111691   0.198993  0.5613 0.5746048
## 
## Lambda: 0.4254, LR test value: 4.9871, p-value: 0.025537
## Asymptotic standard error: 0.15842
##     z-value: 2.6852, p-value: 0.0072485
## Wald statistic: 7.2103, p-value: 0.0072485
## 
## Log likelihood: -181.5846 for error model
## ML residual variance (sigma squared): 92.531, (sigma: 9.6193)
## Number of observations: 49 
## Number of parameters estimated: 7 
## AIC: 377.17, (AIC for lm: 380.16)

sdem.ml_imps <- summary(spatialreg::impacts(sdem.ml), zstats=TRUE, short=TRUE)
a1 <- expand.grid(colnames(sdem.ml_imps$mat), rownames(sdem.ml_imps$mat))
imp_nms <- c(t(cbind(paste(as.character(a1$Var1), as.character(a1$Var2), sep="_"), 
            c(" ", "  ", "   ", "    ", "     ", "      "))))
print(sdem.ml_imps)

## Impact measures (SDEM, estimable, n):
##           Direct   Indirect      Total
## INC   -1.0516727 -1.1567109 -2.2083836
## HOVAL -0.2756084  0.1116912 -0.1639172
## ========================================================
## Standard errors:
##           Direct  Indirect     Total
## INC   0.31951388 0.5786287 0.6478636
## HOVAL 0.09115142 0.1989927 0.2346288
## ========================================================
## Z-values:
##          Direct   Indirect      Total
## INC   -3.291477 -1.9990554 -3.4087171
## HOVAL -3.023633  0.5612828 -0.6986236
## 
## p-values:
##       Direct     Indirect Total     
## INC   0.00099663 0.045602 0.00065269
## HOVAL 0.00249759 0.574605 0.48478731

SDEM SET (griddy Gibbs I)

system.time(sdem.SET0 <- spBreg_err(m.form, data = COL.OLD, listw=lw,
              Durbin=TRUE, control=list(ndraw=titer, nomit=nomit,
              prior=list(gG_sige=TRUE))))

##    user  system elapsed 
##   1.006   0.000   1.027

summary(sdem.SET0)

## 
## Iterations = 1001:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                 Mean      SD Naive SE Time-series SE
## (Intercept)  72.4409 11.2962 0.357217       0.374313
## INC          -1.0224  0.3781 0.011956       0.011956
## HOVAL        -0.2780  0.1019 0.003221       0.003221
## lag.INC      -1.0891  0.7170 0.022675       0.022675
## lag.HOVAL     0.1041  0.2391 0.007561       0.007912
## lambda        0.4874  0.1684 0.005325       0.005325
## sige        113.7969 27.2639 0.862160       0.918867
## 
## 2. Quantiles for each variable:
## 
##                2.5%      25%      50%      75%     97.5%
## (Intercept) 48.3612 65.61389  72.8401  79.9463  93.99314
## INC         -1.7647 -1.27619  -1.0295  -0.7651  -0.24937
## HOVAL       -0.4921 -0.34357  -0.2725  -0.2116  -0.07964
## lag.INC     -2.4666 -1.55715  -1.1134  -0.6252   0.45690
## lag.HOVAL   -0.3985 -0.04706   0.1098   0.2616   0.56673
## lambda       0.1256  0.38469   0.4877   0.6051   0.79302
## sige        73.5526 94.74318 109.5550 128.0236 176.57728

sdem.SET0_imps <- summary(spatialreg::impacts(sdem.SET0), zstats=TRUE, short=TRUE)
print(sdem.SET0_imps)

## Impact measures (SDEM, MCMC, n):
##           Direct   Indirect      Total
## INC   -1.0223705 -1.0890699 -2.1114404
## HOVAL -0.2779983  0.1040978 -0.1739005
## ========================================================
## Standard errors:
##           Direct  Indirect     Total
## INC   0.35828605 0.6794660 0.8193536
## HOVAL 0.09652069 0.2265788 0.2711647
## ========================================================
## Z-values:
##          Direct   Indirect      Total
## INC   -2.853504 -1.6028320 -2.5769587
## HOVAL -2.880194  0.4594332 -0.6413096
## 
## p-values:
##       Direct    Indirect Total    
## INC   0.0043240 0.10897  0.0099674
## HOVAL 0.0039743 0.64592  0.5213216

raftery.diag(sdem.SET0, r=0.01)

## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95 
##                                                    
##              Burn-in  Total Lower bound  Dependence
##              (M)      (N)   (Nmin)       factor (I)
##  (Intercept) 2        893   937          0.953     
##  INC         3        1053  937          1.120     
##  HOVAL       2        893   937          0.953     
##  lag.INC     2        893   937          0.953     
##  lag.HOVAL   3        1053  937          1.120     
##  lambda      2        1001  937          1.070     
##  sige        2        893   937          0.953

geweke.diag(sdem.SET0)

## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## (Intercept)         INC       HOVAL     lag.INC   lag.HOVAL      lambda 
##     0.12084     1.66094    -0.59641     0.74880    -1.27506     0.56032 
##        sige 
##     0.04769

SDEM SET (griddy Gibbs II)

sdem.SET1 <- spBreg_err(m.form, data = COL.OLD, listw=lw, Durbin=TRUE,
              control=list(ndraw=titer, nomit=nomit, prior=list(gG_sige=FALSE)))

summary(sdem.SET1)

## 
## Iterations = 1001:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                 Mean      SD Naive SE Time-series SE
## (Intercept)  72.3496 11.1308 0.351986       0.351986
## INC          -1.0174  0.3734 0.011807       0.012417
## HOVAL        -0.2759  0.1053 0.003331       0.003331
## lag.INC      -1.0896  0.6963 0.022020       0.022020
## lag.HOVAL     0.1020  0.2229 0.007047       0.007047
## lambda        0.4775  0.1771 0.005602       0.005602
## sige        114.3977 26.5740 0.840343       0.881735
## 
## 2. Quantiles for each variable:
## 
##                 2.5%      25%      50%      75%     97.5%
## (Intercept) 49.47217 65.41648  72.8738  79.2084  94.19839
## INC         -1.78985 -1.26050  -1.0239  -0.7681  -0.28767
## HOVAL       -0.47861 -0.34447  -0.2760  -0.2046  -0.08333
## lag.INC     -2.41127 -1.53507  -1.1196  -0.6472   0.26917
## lag.HOVAL   -0.32887 -0.04414   0.1048   0.2425   0.52867
## lambda       0.09642  0.35368   0.4867   0.6006   0.79790
## sige        74.66299 95.37983 110.6347 129.0282 178.51192

sdem.SET1_imps <- summary(spatialreg::impacts(sdem.SET1), zstats=TRUE, short=TRUE)
print(sdem.SET1_imps)

## Impact measures (SDEM, MCMC, n):
##           Direct  Indirect      Total
## INC   -1.0174315 -1.089617 -2.1070486
## HOVAL -0.2759088  0.102023 -0.1738858
## ========================================================
## Standard errors:
##           Direct  Indirect     Total
## INC   0.35380380 0.6598422 0.7835618
## HOVAL 0.09982179 0.2111832 0.2546694
## ========================================================
## Z-values:
##          Direct  Indirect      Total
## INC   -2.875694 -1.651330 -2.6890649
## HOVAL -2.764014  0.483102 -0.6827905
## 
## p-values:
##       Direct    Indirect Total    
## INC   0.0040314 0.098671 0.0071652
## HOVAL 0.0057095 0.629023 0.4947393

raftery.diag(sdem.SET1, r=0.01)

## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95 
##                                                    
##              Burn-in  Total Lower bound  Dependence
##              (M)      (N)   (Nmin)       factor (I)
##  (Intercept) 2        893   937          0.953     
##  INC         3        1053  937          1.120     
##  HOVAL       2        893   937          0.953     
##  lag.INC     2        893   937          0.953     
##  lag.HOVAL   2        893   937          0.953     
##  lambda      2        969   937          1.030     
##  sige        3        1053  937          1.120

geweke.diag(sdem.SET1)

## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## (Intercept)         INC       HOVAL     lag.INC   lag.HOVAL      lambda 
##     0.04962     0.53485    -0.22905     2.31710    -2.31338     1.32696 
##        sige 
##     1.29281

SDEM SET (MH)

sdem.SET2 <- spBreg_err(m.form, data = COL.OLD, listw=lw, Durbin=TRUE,
              control=list(ndraw=titer, nomit=nomit, prior=list(lambdaMH=TRUE)))

summary(sdem.SET2)

## 
## Iterations = 1001:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                  Mean      SD Naive SE Time-series SE
## (Intercept)  72.43159 13.5994 0.430049       0.430049
## INC          -0.98092  0.3920 0.012397       0.013547
## HOVAL        -0.27869  0.1093 0.003455       0.003455
## lag.INC      -1.00347  0.7964 0.025186       0.027151
## lag.HOVAL     0.05652  0.2513 0.007947       0.007947
## lambda        0.57931  0.1730 0.005472       0.013492
## sige        117.46819 28.5541 0.902961       1.061350
## 
## 2. Quantiles for each variable:
## 
##                2.5%     25%       50%      75%    97.5%
## (Intercept) 43.4749 65.2508  73.14647  81.0387  96.4330
## INC         -1.7125 -1.2489  -1.00162  -0.7413  -0.1589
## HOVAL       -0.4855 -0.3506  -0.27507  -0.2058  -0.0638
## lag.INC     -2.5516 -1.5054  -1.03322  -0.5135   0.7180
## lag.HOVAL   -0.4473 -0.1027   0.06309   0.2170   0.5469
## lambda       0.2264  0.4590   0.58358   0.7136   0.8937
## sige        73.7221 97.6762 113.07939 133.0273 184.7009

sdem.SET2_imps <- summary(spatialreg::impacts(sdem.SET2), zstats=TRUE, short=TRUE)
print(sdem.SET2_imps)

## Impact measures (SDEM, MCMC, n):
##           Direct    Indirect      Total
## INC   -0.9809190 -1.00346727 -1.9843863
## HOVAL -0.2786857  0.05651804 -0.2221677
## ========================================================
## Standard errors:
##          Direct  Indirect     Total
## INC   0.3714824 0.7547214 0.9285267
## HOVAL 0.1035426 0.2381483 0.2929530
## ========================================================
## Z-values:
##          Direct   Indirect     Total
## INC   -2.640553 -1.3295863 -2.137134
## HOVAL -2.691509  0.2373229 -0.758373
## 
## p-values:
##       Direct    Indirect Total   
## INC   0.0082771 0.18365  0.032587
## HOVAL 0.0071130 0.81241  0.448228

raftery.diag(sdem.SET2, r=0.01)

## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95 
##                                                    
##              Burn-in  Total Lower bound  Dependence
##              (M)      (N)   (Nmin)       factor (I)
##  (Intercept) 2        969   937          1.030     
##  INC         2        893   937          0.953     
##  HOVAL       3        1053  937          1.120     
##  lag.INC     2        893   937          0.953     
##  lag.HOVAL   2        893   937          0.953     
##  lambda      13       3523  937          3.760     
##  sige        2        969   937          1.030

geweke.diag(sdem.SET2)

## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## (Intercept)         INC       HOVAL     lag.INC   lag.HOVAL      lambda 
##    -0.78451    -0.94483     2.01672    -0.37017     0.64247    -0.08677 
##        sige 
##    -0.23734

SDEM INLA_slm

lc1 <- with(COL.OLD, inla.make.lincomb(INC=+1, LAG_INC=+1))
names(lc1) <- "TOT_INC"
lc2 <- with(COL.OLD, inla.make.lincomb(HOVAL=+1, LAG_HOVAL=+1))
names(lc2) <- "TOT_HOVAL"

system.time(sdem.inla_slm <- inla(update(dm.form, . ~ . + f(idx, model="slm",
                  args.slm=args.slm, hyper=hyper.slm)), data = COL.OLD,
                  lincomb = c(lc1, lc2),
                  family="gaussian", control.family = list(hyper=zero.variance),
                  control.compute=list(dic=TRUE, cpo=TRUE, config=TRUE)))

print(summary(sdem.inla_slm))

## 
## Call:
##    c("inla(formula = update(dm.form, . ~ . + f(idx, model = \"slm\", ", " 
##    args.slm = args.slm, hyper = hyper.slm)), family = \"gaussian\", ", " 
##    data = COL.OLD, lincomb = c(lc1, lc2), control.compute = list(dic = 
##    TRUE, ", " cpo = TRUE, config = TRUE), control.family = list(hyper = 
##    zero.variance))" ) 
## Time used:
##     Pre = 0.356, Running = 3.74, Post = 1.35, Total = 5.45 
## Fixed effects:
##               mean     sd 0.025quant 0.5quant 0.975quant   mode   kld
## (Intercept) 66.554 21.511     23.942   66.631    108.677 66.740 0.002
## INC         -0.831  0.437     -1.691   -0.832      0.029 -0.832 0.000
## HOVAL       -0.298  0.120     -0.535   -0.298     -0.061 -0.298 0.000
## LAG_INC     -0.558  1.011     -2.543   -0.559      1.434 -0.560 0.000
## LAG_HOVAL   -0.012  0.279     -0.561   -0.012      0.536 -0.012 0.000
## 
## Linear combinations (derived):
##           ID   mean    sd 0.025quant 0.5quant 0.975quant   mode kld
## TOT_INC    1 -1.389 1.314     -3.969   -1.391      1.203 -1.394   0
## TOT_HOVAL  2 -0.310 0.364     -1.028   -0.310      0.406 -0.310   0
## 
## Random effects:
##   Name     Model
##     idx SLM model
## 
## Model hyperparameters:
##                    mean    sd 0.025quant 0.5quant 0.975quant  mode
## Precision for idx 0.008 0.002      0.005    0.009      0.011 0.009
## Rho for idx       0.986 0.002      0.982    0.986      0.988 0.987
## 
## Expected number of effective parameters(stdev): 49.00(0.00)
## Number of equivalent replicates : 1.00 
## 
## Deviance Information Criterion (DIC) ...............: -1036.71
## Deviance Information Criterion (DIC, saturated) ....: 2405.91
## Effective number of parameters .....................: 48.91
## 
## Marginal log-Likelihood:  -217.22 
## CPO and PIT are computed
## 
## Posterior marginals for the linear predictor and
##  the fitted values are computed

sdemmarg <- inla.tmarginal(ff, sdem.inla_slm$marginals.hyperpar[[2]]) 
c(mean=mean(sdemmarg[, "x"]), se=sd(sdemmarg[, "x"]))

##        mean          se 
## 0.964336022 0.003921358

system.time(mc.samples <- inla.posterior.sample(niter, sdem.inla_slm))

##    user  system elapsed 
##   9.135   6.229   5.142

smple0 <- t(sapply(mc.samples, function(x) x$latent[100:103]))
colnames(smple0) <- c("Direct_LAG", "Direct_HOVAL", "Indirect_LAG", "Indirect_HOVAL")
smple <- cbind(smple0, Total_LAG=smple0[,1]+smple0[,3], Total_HOVAL=smple0[,2]+smple0[,4])

(sdem.inla_mc_imps1 <- apply(smple, 2, mean)[c(1,3,5,2,4,6)])

##     Direct_LAG   Indirect_LAG      Total_LAG   Direct_HOVAL Indirect_HOVAL 
##     -0.8183172     -0.5521573     -1.3704745     -0.3019328     -0.0154599 
##    Total_HOVAL 
##     -0.3173927

(sdem.inla_mc_imps2 <- apply(smple, 2, sd)[c(1,3,5,2,4,6)])

##     Direct_LAG   Indirect_LAG      Total_LAG   Direct_HOVAL Indirect_HOVAL 
##      0.4514668      1.0384707      1.3474952      0.1232322      0.2714796 
##    Total_HOVAL 
##      0.3594395

SDEM coefficients

sdem_res

##             sdem_ml_coefs sdem_SET0_coefs sdem_SET1_coefs sdem_SET2_coefs
## (Intercept)   73.54513284      72.4408654      72.3496257     72.43158907
##                8.78354327      11.2961999      11.1307694     13.59935421
## INC           -1.05167269      -1.0223705      -1.0174315     -0.98091899
##                0.31951388       0.3780956       0.3733655      0.39202161
## HOVAL         -0.27560842      -0.2779983      -0.2759088     -0.27868572
##                0.09115142       0.1018573       0.1053409      0.10926740
## lag.INC       -1.15671090      -1.0890699      -1.0896171     -1.00346727
##                0.57862875       0.7170335       0.6963247      0.79644975
## lag.HOVAL      0.11169118       0.1040978       0.1020230      0.05651804
##                0.19899272       0.2391063       0.2228594      0.25131548
## lambda         0.42539901       0.4874297       0.4774617      0.57930920
##                0.15842312       0.1683799       0.1771428      0.17304659
## s2            92.53089974     113.7969075     114.3977500    117.46819255
##                        NA      27.2638915      26.5739735     28.55412624
##             sdem_inla_slm_coefs
## (Intercept)        66.554000000
##                    21.511000000
## INC                -0.831000000
##                     0.437000000
## HOVAL              -0.298000000
##                     0.120000000
## lag.INC            -0.558000000
##                     1.011000000
## lag.HOVAL          -0.012000000
##                     0.279000000
## lambda              0.964336022
##                     0.003921358
## s2                           NA
##                              NA

SDEM impacts

print(sdem_imp_res)

##                    sdem.ml   sdem.SET0   sdem.SET1   sdem.SET2 sdem.inla_slm_lc
## Direct_INC     -1.05167269 -1.02237053 -1.01743155 -0.98091899       -0.8310000
##                 0.31951388  0.35828605  0.35380380  0.37148245        0.4370000
## Indirect_INC   -1.15671090 -1.08906990 -1.08961709 -1.00346727       -0.5580000
##                 0.57862875  0.67946604  0.65984225  0.75472142        1.0110000
## Total_INC      -2.20838359 -2.11144043 -2.10704864 -1.98438626       -1.3888023
##                 0.64786356  0.81935362  0.78356184  0.92852675        1.3139672
## Direct_HOVAL   -0.27560842 -0.27799832 -0.27590883 -0.27868572       -0.2980000
##                 0.09115142  0.09652069  0.09982179  0.10354257        0.1200000
## Indirect_HOVAL  0.11169118  0.10409782  0.10202302  0.05651804       -0.0120000
##                 0.19899272  0.22657879  0.21118316  0.23814833        0.2790000
## Total_HOVAL    -0.16391724 -0.17390050 -0.17388581 -0.22216768       -0.3100210
##                 0.23462885  0.27116466  0.25466936  0.29295305        0.3644325
##                sdem.inla_mc
## Direct_INC       -0.8183172
##                   0.4514668
## Indirect_INC     -0.5521573
##                   1.0384707
## Total_INC        -1.3704745
##                   1.3474952
## Direct_HOVAL     -0.3019328
##                   0.1232322
## Indirect_HOVAL   -0.0154599
##                   0.2714796
## Total_HOVAL      -0.3173927
##                   0.3594395

SLM ML

slm.ml <- lagsarlm(m.form, data = COL.OLD, listw=lw, control=list(pre_eig=ev))
summary(slm.ml)

## 
## Call:
## lagsarlm(formula = m.form, data = COL.OLD, listw = lw, control = list(pre_eig = ev))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -37.68585  -5.35636   0.05421   6.02013  23.20555 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 45.079251   7.177347  6.2808 3.369e-10
## INC         -1.031616   0.305143 -3.3808 0.0007229
## HOVAL       -0.265926   0.088499 -3.0049 0.0026570
## 
## Rho: 0.43102, LR test value: 9.9736, p-value: 0.001588
## Asymptotic standard error: 0.11768
##     z-value: 3.6626, p-value: 0.00024962
## Wald statistic: 13.415, p-value: 0.00024962
## 
## Log likelihood: -182.3904 for lag model
## ML residual variance (sigma squared): 95.494, (sigma: 9.7721)
## Number of observations: 49 
## Number of parameters estimated: 5 
## AIC: 374.78, (AIC for lm: 382.75)
## LM test for residual autocorrelation
## test value: 0.31955, p-value: 0.57188

slm.ml_imps <- summary(spatialreg::impacts(slm.ml, tr=tr, R=1000), zstats=TRUE, short=TRUE)
print(slm.ml_imps)

## Impact measures (lag, trace):
##           Direct   Indirect      Total
## INC   -1.0860220 -0.7270848 -1.8131068
## HOVAL -0.2799509 -0.1874253 -0.4673763
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##           Direct  Indirect     Total
## INC   0.31565089 0.3954370 0.5960813
## HOVAL 0.09512891 0.1375705 0.2101898
## 
## Simulated z-values:
##          Direct  Indirect     Total
## INC   -3.465993 -1.971279 -3.143129
## HOVAL -2.959029 -1.529466 -2.340259
## 
## Simulated p-values:
##       Direct     Indirect Total    
## INC   0.00052828 0.048692 0.0016715
## HOVAL 0.00308610 0.126149 0.0192704

SLM SET (griddy Gibbs)

system.time(slm.SET0 <- spBreg_lag(m.form, data = COL.OLD, listw=lw,
              control=list(ndraw=titer, nomit=nomit)))

##    user  system elapsed 
##   0.865   0.027   0.896

summary(slm.SET0)

## 
## Iterations = 1001:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                 Mean       SD Naive SE Time-series SE
## (Intercept)  45.6326  8.28448 0.261978       0.261978
## INC          -1.0420  0.34533 0.010920       0.010138
## HOVAL        -0.2620  0.09305 0.002942       0.003132
## rho           0.4157  0.12915 0.004084       0.004084
## sige        107.8414 23.80501 0.752780       0.801880
## 
## 2. Quantiles for each variable:
## 
##                2.5%     25%      50%      75%     97.5%
## (Intercept) 30.3906 39.6184  45.4544  51.1837  62.52058
## INC         -1.7102 -1.2858  -1.0445  -0.7939  -0.38131
## HOVAL       -0.4441 -0.3233  -0.2643  -0.2018  -0.07704
## rho          0.1495  0.3327   0.4212   0.5088   0.65590
## sige        70.5339 91.3290 104.5761 121.0270 164.05202

slm.SET0_imps <- summary(spatialreg::impacts(slm.SET0, tr=tr), zstats=TRUE, short=TRUE)
print(slm.SET0_imps)

## Impact measures (lag, trace):
##           Direct   Indirect      Total
## INC   -1.0924312 -0.6908227 -1.7832538
## HOVAL -0.2746964 -0.1737102 -0.4484066
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##           Direct  Indirect     Total
## INC   0.36674777 0.5068005 0.7797977
## HOVAL 0.09906053 0.1305946 0.2071283
## 
## Simulated z-values:
##          Direct  Indirect     Total
## INC   -3.005422 -1.524388 -2.404204
## HOVAL -2.799551 -1.500012 -2.284663
## 
## Simulated p-values:
##       Direct    Indirect Total   
## INC   0.0026521 0.12741  0.016208
## HOVAL 0.0051174 0.13361  0.022333

raftery.diag(slm.SET0, r=0.01)

## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95 
##                                                    
##              Burn-in  Total Lower bound  Dependence
##              (M)      (N)   (Nmin)       factor (I)
##  (Intercept) 2        969   937          1.030     
##  INC         2        969   937          1.030     
##  HOVAL       2        893   937          0.953     
##  rho         2        893   937          0.953     
##  sige        2        969   937          1.030

geweke.diag(slm.SET0)

## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## (Intercept)         INC       HOVAL         rho        sige 
##      1.1577     -0.1113     -1.4683     -0.7896      0.9056

SLM SET (MH)

system.time(slm.SET1 <- spBreg_lag(m.form, data = COL.OLD, listw=lw,
              control=list(ndraw=titer, nomit=nomit, prior=list(rhoMH=TRUE))))

##    user  system elapsed 
##   0.732   0.003   0.739

summary(slm.SET1)

## 
## Iterations = 1001:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                 Mean       SD Naive SE Time-series SE
## (Intercept)  45.5493  8.79426 0.278099       1.715790
## INC          -1.0488  0.36618 0.011580       0.044560
## HOVAL        -0.2637  0.09443 0.002986       0.002986
## rho           0.4217  0.14019 0.004433       0.033249
## sige        109.4689 23.95420 0.757498       0.803948
## 
## 2. Quantiles for each variable:
## 
##                2.5%     25%      50%      75%     97.5%
## (Intercept) 27.8764 39.6485  45.5312  51.6086  62.68828
## INC         -1.7524 -1.2933  -1.0674  -0.8182  -0.32457
## HOVAL       -0.4353 -0.3268  -0.2661  -0.2022  -0.08124
## rho          0.1406  0.3290   0.4312   0.5153   0.66443
## sige        71.9152 92.6849 106.0001 123.3212 161.35279

slm.SET1_imps <- summary(spatialreg::impacts(slm.SET1, tr=tr), zstats=TRUE, short=TRUE)
print(slm.SET1_imps)

## Impact measures (lag, trace):
##           Direct   Indirect      Total
## INC   -1.1012963 -0.7123675 -1.8136638
## HOVAL -0.2768525 -0.1790806 -0.4559331
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##          Direct  Indirect     Total
## INC   0.3712551 0.4147001 0.6365957
## HOVAL 0.1013266 0.1515512 0.2277523
## 
## Simulated z-values:
##          Direct  Indirect     Total
## INC   -2.977392 -1.773007 -2.891378
## HOVAL -2.763559 -1.361364 -2.135384
## 
## Simulated p-values:
##       Direct    Indirect Total    
## INC   0.0029071 0.076227 0.0038356
## HOVAL 0.0057175 0.173399 0.0327297

raftery.diag(slm.SET1, r=0.01)

## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95 
##                                                    
##              Burn-in  Total Lower bound  Dependence
##              (M)      (N)   (Nmin)       factor (I)
##  (Intercept) 22       7094  937          7.570     
##  INC         3        1143  937          1.220     
##  HOVAL       2        969   937          1.030     
##  rho         32       8922  937          9.520     
##  sige        2        893   937          0.953

geweke.diag(slm.SET1)

## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## (Intercept)         INC       HOVAL         rho        sige 
##     2.00685    -2.26613     0.29433    -1.88615     0.04327

SLM INLA_slm

mm <- model.matrix(m.form, data=COL.OLD)
betaprec <- .0001
Q.beta = Diagonal(n=ncol(mm), betaprec)
args.slm <- list(rho.min = rho.min, rho.max = rho.max, W = sW, X = mm, Q.beta = Q.beta)
system.time(slm.inla_slm <- inla(CRIME ~ 0 + f(idx, model="slm",
                  args.slm=args.slm, hyper=hyper.slm), data = COL.OLD,
                  family="gaussian", control.family = list(hyper=zero.variance),
                  control.compute=list(dic=TRUE, cpo=TRUE, config=TRUE)))

print(summary(slm.inla_slm))

## 
## Call:
##    c("inla(formula = CRIME ~ 0 + f(idx, model = \"slm\", args.slm = 
##    args.slm, ", " hyper = hyper.slm), family = \"gaussian\", data = 
##    COL.OLD, ", " control.compute = list(dic = TRUE, cpo = TRUE, config = 
##    TRUE), ", " control.family = list(hyper = zero.variance))") 
## Time used:
##     Pre = 0.313, Running = 0.827, Post = 0.337, Total = 1.48 
## Random effects:
##   Name     Model
##     idx SLM model
## 
## Model hyperparameters:
##                    mean    sd 0.025quant 0.5quant 0.975quant  mode
## Precision for idx 0.009 0.003      0.004    0.008      0.015 0.008
## Rho for idx       0.786 0.023      0.740    0.787      0.828 0.788
## 
## Expected number of effective parameters(stdev): 49.00(0.00)
## Number of equivalent replicates : 1.00 
## 
## Deviance Information Criterion (DIC) ...............: -1036.71
## Deviance Information Criterion (DIC, saturated) ....: 2405.91
## Effective number of parameters .....................: 48.91
## 
## Marginal log-Likelihood:  -208.16 
## CPO and PIT are computed
## 
## Posterior marginals for the linear predictor and
##  the fitted values are computed

slmmarg <- inla.tmarginal(ff, slm.inla_slm$marginals.hyperpar[[2]]) 
c(mean=mean(slmmarg[, "x"]), se=sd(slmmarg[, "x"]))

##       mean         se 
## 0.45789685 0.05713219

slm.inla_slm$summary.random$idx[n+1:ncol(mm),]

##    ID       mean         sd 0.025quant   0.5quant  0.975quant       mode
## 50 50 45.8760300 7.65655947 30.8530756 45.8151944 61.04027876 45.5348625
## 51 51 -1.0492613 0.33640283 -1.7200776 -1.0462605 -0.39454448 -1.0396748
## 52 52 -0.2656263 0.09260125 -0.4482717 -0.2655907 -0.08337495 -0.2655114
##             kld
## 50 7.024829e-04
## 51 9.354764e-05
## 52 5.139651e-06

system.time(mc.samples <- inla.posterior.sample(niter, slm.inla_slm))

##    user  system elapsed 
##   5.162   4.666   2.586

smples0 <- t(sapply(mc.samples, function(x) c(x$latent[99:101], 
                                              ff(x$hyperpar[2]), 0)))
colnames(smples0) <- c("(Intercept)", "INC", "HOVAL", "rho", "sige")
smples <- as.mcmc(smples0)
attr(smples, "listw_style") <- "W"
attr(smples, "control")$interval <- c(rho.min, rho.max)
attr(smples, "type") <- "lag"
slm.inla_slm_imps <- summary(spatialreg:::impacts.MCMC_sar_G(smples, tr=tr),
                             zstats=TRUE, short=TRUE)
print(slm.inla_slm_imps)

## Impact measures (lag, trace):
##           Direct   Indirect      Total
## INC   -1.1083674 -0.6934438 -1.8018111
## HOVAL -0.2754759 -0.1723499 -0.4478259
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##           Direct  Indirect     Total
## INC   0.33502216 0.3747234 0.5817865
## HOVAL 0.09714212 0.1271643 0.2000713
## 
## Simulated z-values:
##          Direct  Indirect     Total
## INC   -3.321059 -1.928136 -3.154329
## HOVAL -2.859419 -1.507487 -2.346506
## 
## Simulated p-values:
##       Direct     Indirect Total    
## INC   0.00089677 0.053838 0.0016087
## HOVAL 0.00424418 0.131686 0.0189503

SLM coefficients

slm_res

##             slm_ml_coefs slm_SET0_coefs slm_SET1_coefs slm_inla_slm_coefs
## (Intercept)  45.07925051    45.63255381    45.54931437        45.87602997
##               7.17734654     8.28447816     8.79426490         7.65655947
## INC          -1.03161570    -1.04203042    -1.04879130        -1.04926133
##               0.30514297     0.34533116     0.36618418         0.33640283
## HOVAL        -0.26592625    -0.26202288    -0.26365342        -0.26562628
##               0.08849862     0.09304823     0.09442906         0.09260125
## rho           0.43102320     0.41565783     0.42172783         0.45789685
##               0.11768073     0.12915209     0.14018934         0.05713219
## s2           95.49449671   107.84143322   109.46888841                 NA
##                       NA    23.80500591    23.95419974                 NA

SLM impacts

print(slm_imp_res)

##                     slm.ml    slm.SET0 slm.SET1_impv slm.inla_slm
## Direct_INC     -1.08602200 -1.09243117    -1.1012963  -1.10836737
##                 0.31565089  0.36674777     0.3712551   0.33502216
## Indirect_INC   -0.72708479 -0.69082268    -0.7123675  -0.69344376
##                 0.39543702  0.50680053     0.4147001   0.37472341
## Total_INC      -1.81310679 -1.78325385    -1.8136638  -1.80181114
##                 0.59608129  0.77979768     0.6365957   0.58178650
## Direct_HOVAL   -0.27995092 -0.27469635    -0.2768525  -0.27547592
##                 0.09512891  0.09906053     0.1013266   0.09714212
## Indirect_HOVAL -0.18742535 -0.17371023    -0.1790806  -0.17234995
##                 0.13757051  0.13059456     0.1515512   0.12716430
## Total_HOVAL    -0.46737627 -0.44840658    -0.4559331  -0.44782587
##                 0.21018982  0.20712832     0.2277523   0.20007127

SDM ML

sdm.ml <- lagsarlm(m.form, data = COL.OLD, listw=lw, Durbin=TRUE, 
            control=list(pre_eig=ev))
summary(sdm.ml)

## 
## Call:lagsarlm(formula = m.form, data = COL.OLD, listw = lw, Durbin = TRUE, 
##     control = list(pre_eig = ev))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -37.47829  -6.46731  -0.33835   6.05200  22.62969 
## 
## Type: mixed 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 42.822415  12.667205  3.3806 0.0007233
## INC         -0.914223   0.331094 -2.7612 0.0057586
## HOVAL       -0.293738   0.089212 -3.2926 0.0009927
## lag.INC     -0.520284   0.565129 -0.9206 0.3572355
## lag.HOVAL    0.245640   0.178917  1.3729 0.1697756
## 
## Rho: 0.42634, LR test value: 5.3693, p-value: 0.020494
## Asymptotic standard error: 0.15623
##     z-value: 2.7288, p-value: 0.0063561
## Wald statistic: 7.4465, p-value: 0.0063561
## 
## Log likelihood: -181.3935 for mixed model
## ML residual variance (sigma squared): 91.791, (sigma: 9.5808)
## Number of observations: 49 
## Number of parameters estimated: 7 
## AIC: 376.79, (AIC for lm: 380.16)
## LM test for residual autocorrelation
## test value: 0.28919, p-value: 0.59074

sdm.ml_imps <- summary(spatialreg::impacts(sdm.ml, tr=tr, R=1000), zstats=TRUE, short=TRUE)
print(sdm.ml_imps)

## Impact measures (mixed, trace):
##           Direct  Indirect       Total
## INC   -1.0238910 -1.476711 -2.50060224
## HOVAL -0.2792275  0.195385 -0.08384256
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##           Direct  Indirect     Total
## INC   0.31964300 0.8705975 0.9197001
## HOVAL 0.09114786 0.3252308 0.3607729
## 
## Simulated z-values:
##          Direct  Indirect      Total
## INC   -3.163043 -1.771761 -2.7764867
## HOVAL -3.058603  0.630698 -0.2041804
## 
## Simulated p-values:
##       Direct    Indirect Total   
## INC   0.0015613 0.076434 0.005495
## HOVAL 0.0022237 0.528238 0.838213

SDM SET (griddy Gibbs)

system.time(sdm.SET0 <- spBreg_lag(m.form, data = COL.OLD, listw=lw, Durbin=TRUE,
              control=list(ndraw=titer, nomit=nomit)))

##    user  system elapsed 
##   0.856   0.005   0.864

summary(sdm.SET0)

## 
## Iterations = 1001:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                 Mean       SD Naive SE Time-series SE
## (Intercept)  46.1055 14.09838 0.445830       0.514794
## INC          -0.9386  0.35805 0.011322       0.011053
## HOVAL        -0.2931  0.09729 0.003077       0.003077
## lag.INC      -0.5828  0.63010 0.019925       0.022812
## lag.HOVAL     0.2387  0.18592 0.005879       0.005879
## rho           0.3765  0.16660 0.005268       0.005591
## sige        109.3179 23.73987 0.750721       0.860683
## 
## 2. Quantiles for each variable:
## 
##                 2.5%     25%      50%      75%     97.5%
## (Intercept) 19.53828 36.5842  46.5180  54.9113  74.65694
## INC         -1.66178 -1.1635  -0.9311  -0.6953  -0.29145
## HOVAL       -0.48601 -0.3540  -0.2920  -0.2317  -0.09856
## lag.INC     -1.84180 -1.0252  -0.5841  -0.1742   0.67992
## lag.HOVAL   -0.15500  0.1209   0.2431   0.3579   0.59479
## rho          0.04745  0.2584   0.3852   0.4890   0.69185
## sige        73.02627 92.7043 105.5223 123.1922 162.88012

sdm.SET0_imps <- summary(spatialreg::impacts(sdm.SET0, tr=tr), zstats=TRUE, short=TRUE)
print(sdm.SET0_imps)

## Impact measures (mixed, trace):
##           Direct   Indirect       Total
## INC   -1.0340395 -1.4060873 -2.44012675
## HOVAL -0.2800597  0.1927249 -0.08733482
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##          Direct  Indirect     Total
## INC   0.3731458 1.2721251 1.4104757
## HOVAL 0.1008959 0.3145878 0.3531431
## 
## Simulated z-values:
##          Direct   Indirect      Total
## INC   -2.822983 -1.2461065 -1.8707075
## HOVAL -2.793043  0.5913115 -0.2712414
## 
## Simulated p-values:
##       Direct    Indirect Total   
## INC   0.0047579 0.21273  0.061386
## HOVAL 0.0052215 0.55431  0.786205

raftery.diag(sdm.SET0, r=0.01)

## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95 
##                                                    
##              Burn-in  Total Lower bound  Dependence
##              (M)      (N)   (Nmin)       factor (I)
##  (Intercept) 2        969   937          1.030     
##  INC         2        969   937          1.030     
##  HOVAL       2        969   937          1.030     
##  lag.INC     2        893   937          0.953     
##  lag.HOVAL   3        1053  937          1.120     
##  rho         2        969   937          1.030     
##  sige        3        1053  937          1.120

geweke.diag(sdm.SET0)

## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## (Intercept)         INC       HOVAL     lag.INC   lag.HOVAL         rho 
##     0.60884     0.39423    -2.23812     0.01090     0.06195    -0.59075 
##        sige 
##     0.05199

SDM SET (MH)

system.time(sdm.SET1 <- spBreg_lag(m.form, data = COL.OLD, listw=lw, Durbin=TRUE,
              control=list(ndraw=titer, nomit=nomit, prior=list(rhoMH=TRUE))))

##    user  system elapsed 
##   0.801   0.000   0.805

summary(sdm.SET1)

## 
## Iterations = 1001:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                 Mean       SD Naive SE Time-series SE
## (Intercept)  45.8511 15.40696 0.487211       4.486160
## INC          -0.9410  0.36021 0.011391       0.013852
## HOVAL        -0.2962  0.09508 0.003007       0.003007
## lag.INC      -0.5913  0.65238 0.020630       0.123869
## lag.HOVAL     0.2423  0.19117 0.006045       0.006045
## rho           0.3877  0.19122 0.006047       0.068835
## sige        108.3573 23.30650 0.737016       0.877693
## 
## 2. Quantiles for each variable:
## 
##                  2.5%     25%      50%      75%    97.5%
## (Intercept) 15.690786 35.2072  46.3009  56.1017  75.5416
## INC         -1.667744 -1.1959  -0.9340  -0.6909  -0.2460
## HOVAL       -0.489320 -0.3588  -0.2983  -0.2353  -0.1037
## lag.INC     -1.855550 -1.0437  -0.5953  -0.1347   0.7050
## lag.HOVAL   -0.134312  0.1133   0.2415   0.3723   0.6289
## rho         -0.007453  0.2628   0.3691   0.5274   0.7948
## sige        70.717514 91.6046 105.6254 122.5781 162.0995

sdm.SET1_imps <- summary(spatialreg::impacts(sdm.SET1, tr=tr), zstats=TRUE, short=TRUE)
print(sdm.SET1_imps)

## Impact measures (mixed, trace):
##          Direct   Indirect       Total
## INC   -1.042267 -1.4600656 -2.50233287
## HOVAL -0.282657  0.1947247 -0.08793232
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##          Direct  Indirect     Total
## INC   0.3526054 0.9120520 0.9854826
## HOVAL 0.1002180 0.3583432 0.3977696
## 
## Simulated z-values:
##          Direct   Indirect      Total
## INC   -2.937898 -1.6092273 -2.5404991
## HOVAL -2.834868  0.5402616 -0.2275331
## 
## Simulated p-values:
##       Direct    Indirect Total   
## INC   0.0033045 0.10757  0.011069
## HOVAL 0.0045845 0.58902  0.820009

raftery.diag(sdm.SET1, r=0.01)

## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95 
##                                                    
##              Burn-in  Total Lower bound  Dependence
##              (M)      (N)   (Nmin)       factor (I)
##  (Intercept) 24       4678  937          4.990     
##  INC         3        1053  937          1.120     
##  HOVAL       2        893   937          0.953     
##  lag.INC     3        1053  937          1.120     
##  lag.HOVAL   2        893   937          0.953     
##  rho         25       6703  937          7.150     
##  sige        3        1053  937          1.120

geweke.diag(sdm.SET1)

## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## (Intercept)         INC       HOVAL     lag.INC   lag.HOVAL         rho 
##     -5.9999      2.4992     -0.6331      6.9194      0.2787      4.7773 
##        sige 
##     -1.4206

SDM INLA_slm

mm <- model.matrix(dm.form, data=COL.OLD)
betaprec <- .0001
Q.beta = Diagonal(n=ncol(mm), betaprec)
args.slm <- list(rho.min = rho.min, rho.max = rho.max, W = sW, X = mm, Q.beta = Q.beta)
system.time(sdm.inla_slm <- inla(CRIME ~ 0 + f(idx, model="slm",
                  args.slm=args.slm, hyper=hyper.slm), data = COL.OLD,
                  family="gaussian", control.family = list(hyper=zero.variance),
                  control.compute=list(dic=TRUE, cpo=TRUE, config=TRUE)))

print(summary(sdm.inla_slm))

## 
## Call:
##    c("inla(formula = CRIME ~ 0 + f(idx, model = \"slm\", args.slm = 
##    args.slm, ", " hyper = hyper.slm), family = \"gaussian\", data = 
##    COL.OLD, ", " control.compute = list(dic = TRUE, cpo = TRUE, config = 
##    TRUE), ", " control.family = list(hyper = zero.variance))") 
## Time used:
##     Pre = 0.323, Running = 2.24, Post = 0.894, Total = 3.46 
## Random effects:
##   Name     Model
##     idx SLM model
## 
## Model hyperparameters:
##                    mean    sd 0.025quant 0.5quant 0.975quant  mode
## Precision for idx 0.009 0.001      0.007    0.009      0.012 0.009
## Rho for idx       0.942 0.007      0.928    0.943      0.955 0.943
## 
## Expected number of effective parameters(stdev): 49.00(0.00)
## Number of equivalent replicates : 1.00 
## 
## Deviance Information Criterion (DIC) ...............: -1036.71
## Deviance Information Criterion (DIC, saturated) ....: 2405.91
## Effective number of parameters .....................: 48.91
## 
## Marginal log-Likelihood:  -224.86 
## CPO and PIT are computed
## 
## Posterior marginals for the linear predictor and
##  the fitted values are computed

sdmmarg <- inla.tmarginal(ff, sdm.inla_slm$marginals.hyperpar[[2]]) 
c(mean=mean(sdmmarg[, "x"]), se=sd(sdmmarg[, "x"]))

##      mean        se 
## 0.8533202 0.0172823

sdm.inla_slm$summary.random$idx[n+1:ncol(mm),]

##    ID       mean          sd 0.025quant   0.5quant 0.975quant       mode
## 50 50 42.6174774 14.41909313 17.9912603 41.1461449 73.0250992 37.7587395
## 51 51 -0.9127404  0.37034830 -1.6496933 -0.9102208 -0.1900851 -0.9050847
## 52 52 -0.2934522  0.09819155 -0.4867533 -0.2935121 -0.1000153 -0.2936239
## 53 53 -0.5162238  0.64205432 -1.8233101 -0.4999777  0.7018257 -0.4654768
## 54 54  0.2479965  0.19507389 -0.1381108  0.2486011  0.6304817  0.2498553
##             kld
## 50 5.973626e-04
## 51 2.228635e-05
## 52 8.344655e-06
## 53 1.360082e-04
## 54 1.176557e-05

system.time(mc.samples <- inla.posterior.sample(niter, sdm.inla_slm))

##    user  system elapsed 
##   6.332   5.505   2.599

smples0 <- t(sapply(mc.samples, function(x) c(x$latent[99:103], 
                                              ff(x$hyperpar[2]), 0)))
colnames(smples0) <- c("(Intercept)", "INC", "HOVAL", "LAG_INC", "LAG_HOVAL", "rho", "sige")
smples <- as.mcmc(smples0)
attr(smples, "listw_style") <- "W"
attr(smples, "control")$interval <- c(rho.min, rho.max)
attr(smples, "type") <- "Durbin"

sdm.inla_slm_imps <- summary(spatialreg:::impacts.MCMC_sar_G(smples, tr=tr),
                             zstats=TRUE, short=TRUE)
print(sdm.inla_slm_imps)

## Impact measures (mixed, trace):
##           Direct   Indirect       Total
## INC   -1.0209818 -1.4649977 -2.48597947
## HOVAL -0.2835603  0.2035812 -0.07997903
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##          Direct  Indirect     Total
## INC   0.3596123 0.8913226 0.9698527
## HOVAL 0.1042759 0.3390587 0.3824224
## 
## Simulated z-values:
##          Direct   Indirect      Total
## INC   -2.824088 -1.6487445 -2.5623889
## HOVAL -2.730595  0.5925078 -0.2192345
## 
## Simulated p-values:
##       Direct    Indirect Total   
## INC   0.0047415 0.09920  0.010395
## HOVAL 0.0063220 0.55351  0.826467

SDM coefficients

sdm_res

##             sdm_ml_coefs sdm_SET0_coefs sdm_inla_slm_coefs
## (Intercept)  42.82241459     46.1055361        42.61747742
##              12.66720458     14.0983842        14.41909313
## INC          -0.91422319     -0.9385890        -0.91274041
##               0.33109401      0.3580463         0.37034830
## HOVAL        -0.29373778     -0.2931426        -0.29345217
##               0.08921192      0.0972933         0.09819155
## lag.INC      -0.52028354     -0.5827586        -0.51622385
##               0.56512898      0.6300983         0.64205432
## lag.HOVAL     0.24564028      0.2386919         0.24799651
##               0.17891745      0.1859217         0.19507389
## rho           0.15623439      0.3765293         0.85332021
##               0.15623439      0.1666008         0.01728230
## s2           91.79121719    109.3179223                 NA
##                       NA     23.7398697                 NA

SDM impacts

print(sdm_imp_res)

##                     sdm.ml    sdm.SET0 sdm.SET1_impv
## Direct_INC     -1.02389096 -1.03403946   -1.04226728
##                 0.31964300  0.37314578    0.35260539
## Indirect_INC   -1.47671128 -1.40608729   -1.46006559
##                 0.87059750  1.27212507    0.91205203
## Total_INC      -2.50060224 -2.44012675   -2.50233287
##                 0.91970011  1.41047573    0.98548256
## Direct_HOVAL   -0.27922754 -0.28005974   -0.28265699
##                 0.09114786  0.10089586    0.10021799
## Indirect_HOVAL  0.19538498  0.19272491    0.19472467
##                 0.32523079  0.31458783    0.35834321
## Total_HOVAL    -0.08384256 -0.08733482   -0.08793232
##                 0.36077291  0.35314306    0.39776956

SAC ML

sac.ml <- sacsarlm(m.form, data = COL.OLD, listw=lw, Durbin=FALSE, 
            control=list(pre_eig1=ev, pre_eig2=ev))
summary(sac.ml)

## 
## Call:sacsarlm(formula = m.form, data = COL.OLD, listw = lw, Durbin = FALSE, 
##     control = list(pre_eig1 = ev, pre_eig2 = ev))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -37.32081  -5.33662  -0.20219   6.59672  23.25604 
## 
## Type: sac 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 47.783766   9.902659  4.8253 1.398e-06
## INC         -1.025894   0.326326 -3.1438  0.001668
## HOVAL       -0.281651   0.090033 -3.1283  0.001758
## 
## Rho: 0.36807
## Asymptotic standard error: 0.19668
##     z-value: 1.8714, p-value: 0.061285
## Lambda: 0.16668
## Asymptotic standard error: 0.29661
##     z-value: 0.56196, p-value: 0.57415
## 
## LR test value: 10.285, p-value: 0.0058432
## 
## Log likelihood: -182.2348 for sac model
## ML residual variance (sigma squared): 95.604, (sigma: 9.7777)
## Number of observations: 49 
## Number of parameters estimated: 6 
## AIC: 376.47, (AIC for lm: 382.75)

sac.ml_imps <- summary(spatialreg::impacts(sac.ml, tr=tr, R=1000), zstats=TRUE, short=TRUE)
print(sac.ml_imps)

## Impact measures (sac, trace):
##           Direct   Indirect      Total
## INC   -1.0632723 -0.5601501 -1.6234223
## HOVAL -0.2919129 -0.1537847 -0.4456977
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##           Direct  Indirect     Total
## INC   0.32622768 0.9278461 1.0375428
## HOVAL 0.09891206 0.3970437 0.4458593
## 
## Simulated z-values:
##          Direct   Indirect     Total
## INC   -3.259060 -0.7655765 -1.709359
## HOVAL -3.063347 -0.5685952 -1.185933
## 
## Simulated p-values:
##       Direct    Indirect Total   
## INC   0.0011178 0.44393  0.087385
## HOVAL 0.0021888 0.56963  0.235649

SAC SET (MH)

system.time(sac.SET0 <- spBreg_sac(m.form, data = COL.OLD, listw=lw, Durbin=FALSE,
              control=list(ndraw=titer, nomit=nomit)))

##    user  system elapsed 
##   1.442   0.003   1.451

summary(sac.SET0)

## 
## Iterations = 1001:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                 Mean       SD Naive SE Time-series SE
## (Intercept)  48.2791  9.45610 0.299028       0.822666
## INC          -1.0028  0.34683 0.010968       0.012573
## HOVAL        -0.2793  0.09652 0.003052       0.003298
## rho           0.3392  0.19008 0.006011       0.021859
## lambda        0.1532  0.27792 0.008789       0.030052
## sige        105.0834 22.49282 0.711285       0.711285
## 
## 2. Quantiles for each variable:
## 
##                2.5%      25%      50%      75%     97.5%
## (Intercept) 29.9542 41.92659  48.2736  54.6431  66.79671
## INC         -1.6766 -1.23544  -0.9924  -0.7708  -0.32513
## HOVAL       -0.4660 -0.34515  -0.2788  -0.2142  -0.09106
## rho         -0.1022  0.22484   0.3560   0.4637   0.66106
## lambda      -0.3811 -0.03245   0.1588   0.3682   0.71128
## sige        68.6209 88.99825 102.2609 118.3243 157.60420

sac.SET0_imps <- summary(spatialreg::impacts(sac.SET0, tr=tr), zstats=TRUE, short=TRUE)
print(sac.SET0_imps)

## Impact measures (sac, trace):
##           Direct   Indirect      Total
## INC   -1.0331351 -0.4844538 -1.5175890
## HOVAL -0.2877881 -0.1349485 -0.4227366
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##           Direct  Indirect     Total
## INC   0.35666369 0.4864124 0.7016177
## HOVAL 0.09973797 0.1434405 0.2032738
## 
## Simulated z-values:
##          Direct  Indirect     Total
## INC   -2.935283 -1.200011 -2.324070
## HOVAL -2.924575 -1.133069 -2.234519
## 
## Simulated p-values:
##       Direct    Indirect Total   
## INC   0.0033324 0.23014  0.020122
## HOVAL 0.0034493 0.25719  0.025449

SAC coefficients

sac_res

##             sac_ml_coefs sac_SET0_coefs
## (Intercept)  47.78376647    48.27914690
##               9.90265886     9.45609804
## INC          -1.02589358    -1.00278373
##               0.32632614     0.34682840
## HOVAL        -0.28165091    -0.27933348
##               0.09003346     0.09651832
## rho           0.36806734     0.33922573
##               0.19667646     0.19007544
## lambda        0.16667932     0.15316476
##               0.29660547     0.27791943
## s2           95.60419515   105.08341745
##                       NA    22.49281529

SAC impacts

print(sac_imp_res)

##                     sac.ml    sac.SET0
## Direct_INC     -1.06327225 -1.03313515
##                 0.32622768  0.35666369
## Indirect_INC   -0.56015006 -0.48445380
##                 0.92784613  0.48641242
## Total_INC      -1.62342231 -1.51758895
##                 1.03754281  0.70161769
## Direct_HOVAL   -0.29191292 -0.28778812
##                 0.09891206  0.09973797
## Indirect_HOVAL -0.15378473 -0.13494851
##                 0.39704368  0.14344055
## Total_HOVAL    -0.44569766 -0.42273662
##                 0.44585929  0.20327384

SDAC ML

system.time(sdac.ml <- sacsarlm(m.form, data = COL.OLD, listw=lw, Durbin=TRUE, 
            control=list(pre_eig1=ev, pre_eig2=ev)))

##    user  system elapsed 
##   0.367   0.000   0.369

summary(sdac.ml)

## 
## Call:sacsarlm(formula = m.form, data = COL.OLD, listw = lw, Durbin = TRUE, 
##     control = list(pre_eig1 = ev, pre_eig2 = ev))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -37.8045  -6.5244  -0.2207   5.9944  22.8691 
## 
## Type: sacmixed 
## Coefficients: (asymptotic standard errors) 
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) 50.92025   68.25721  0.7460 0.455664
## INC         -0.95072    0.44033 -2.1591 0.030841
## HOVAL       -0.28650    0.09994 -2.8667 0.004148
## lag.INC     -0.69261    1.69113 -0.4096 0.682132
## lag.HOVAL    0.20852    0.28702  0.7265 0.467546
## 
## Rho: 0.31557
## Asymptotic standard error: 0.9458
##     z-value: 0.33365, p-value: 0.73864
## Lambda: 0.15415
## Asymptotic standard error: 1.0643
##     z-value: 0.14484, p-value: 0.88484
## 
## LR test value: 12.07, p-value: 0.016837
## 
## Log likelihood: -181.3422 for sacmixed model
## ML residual variance (sigma squared): 93.149, (sigma: 9.6514)
## Number of observations: 49 
## Number of parameters estimated: 8 
## AIC: 378.68, (AIC for lm: 382.75)

sdac.ml_imps <- summary(spatialreg::impacts(sdac.ml, tr=tr, R=1000), zstats=TRUE, short=TRUE)
print(sac.ml_imps)

## Impact measures (sac, trace):
##           Direct   Indirect      Total
## INC   -1.0632723 -0.5601501 -1.6234223
## HOVAL -0.2919129 -0.1537847 -0.4456977
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##           Direct  Indirect     Total
## INC   0.32622768 0.9278461 1.0375428
## HOVAL 0.09891206 0.3970437 0.4458593
## 
## Simulated z-values:
##          Direct   Indirect     Total
## INC   -3.259060 -0.7655765 -1.709359
## HOVAL -3.063347 -0.5685952 -1.185933
## 
## Simulated p-values:
##       Direct    Indirect Total   
## INC   0.0011178 0.44393  0.087385
## HOVAL 0.0021888 0.56963  0.235649

SDAC SET (MH)

system.time(sdac.SET0 <- spBreg_sac(m.form, data = COL.OLD, listw=lw, Durbin=TRUE,
              control=list(ndraw=titer, nomit=nomit)))

##    user  system elapsed 
##   1.465   0.002   1.473

summary(sdac.SET0)

## 
## Iterations = 1001:2000
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                 Mean       SD Naive SE Time-series SE
## (Intercept)  58.0481 22.40348 0.708460       2.547075
## INC          -0.9793  0.36718 0.011611       0.017575
## HOVAL        -0.2843  0.09447 0.002987       0.003181
## lag.INC      -0.8654  0.74086 0.023428       0.058785
## lag.HOVAL     0.1879  0.21672 0.006853       0.011749
## rho           0.2112  0.29380 0.009291       0.034349
## lambda        0.1917  0.29777 0.009416       0.034666
## sige        101.3116 23.30551 0.736985       0.873978
## 
## 2. Quantiles for each variable:
## 
##                2.5%        25%     50%      75%    97.5%
## (Intercept) 20.3456 41.1925600 55.2472  73.3555 105.2657
## INC         -1.6902 -1.2331570 -0.9819  -0.7466  -0.2701
## HOVAL       -0.4746 -0.3430497 -0.2839  -0.2205  -0.0975
## lag.INC     -2.3850 -1.3391199 -0.8466  -0.3349   0.5261
## lag.HOVAL   -0.2525  0.0591973  0.1888   0.3405   0.5921
## rho         -0.4182 -0.0005182  0.2493   0.4513   0.6737
## lambda      -0.3891 -0.0170605  0.1972   0.4132   0.7130
## sige        66.3334 85.2337068 97.4177 112.9680 158.6974

sdac.SET0_imps <- summary(spatialreg::impacts(sdac.SET0, tr=tr), zstats=TRUE, short=TRUE)
print(sdac.SET0_imps)

## Impact measures (sacmixed, trace):
##           Direct   Indirect     Total
## INC   -1.0341718 -1.3046940 -2.338866
## HOVAL -0.2777375  0.1555625 -0.122175
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##           Direct  Indirect     Total
## INC   0.36171444 0.9875037 1.0847532
## HOVAL 0.09548571 0.2969108 0.3300326
## 
## Simulated z-values:
##          Direct   Indirect      Total
## INC   -2.879967 -1.4364589 -2.2680127
## HOVAL -2.899241  0.5670263 -0.3286943
## 
## Simulated p-values:
##       Direct    Indirect Total   
## INC   0.0039772 0.15087  0.023328
## HOVAL 0.0037407 0.57070  0.742387

raftery.diag(sdac.SET0, r=0.01)

## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95 
##                                                    
##              Burn-in  Total Lower bound  Dependence
##              (M)      (N)   (Nmin)       factor (I)
##  (Intercept) 3        1143  937          1.220     
##  INC         2        969   937          1.030     
##  HOVAL       2        969   937          1.030     
##  lag.INC     3        1053  937          1.120     
##  lag.HOVAL   3        1053  937          1.120     
##  rho         22       6094  937          6.500     
##  lambda      18       4793  937          5.120     
##  sige        2        893   937          0.953

geweke.diag(sdac.SET0)

## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## (Intercept)         INC       HOVAL     lag.INC   lag.HOVAL         rho 
##     -1.3752      2.4222     -2.1499      0.8725      1.8681      1.5424 
##      lambda        sige 
##     -1.2942     -0.9987

SDAC coefficients

sdac_res

##             sdac_ml_coefs sdac_SET0_coefs
## (Intercept)   50.92025501     58.04812352
##               68.25720777     22.40347710
## INC           -0.95071665     -0.97934527
##                0.44032512      0.36718174
## HOVAL         -0.28649657     -0.28427465
##                0.09994037      0.09447263
## lag.INC       -0.69261107     -0.86544221
##                1.69112639      0.74086149
## lag.HOVAL      0.20851641      0.18790874
##                0.28702413      0.21671826
## rho            0.31556940      0.21124698
##                0.94580491      0.29380248
## lambda         0.15415430      0.19170025
##                1.06432263      0.29776693
## s2            93.14860278    101.31160615
##                        NA     23.30550749

SDAC impacts

print(sdac_imp_res)

##                   sdac.ml   sdac.SET0
## Direct_INC     -1.0317003 -1.03417179
##                 0.3849620  0.36171444
## Indirect_INC   -1.3693141 -1.30469404
##                 2.3546369  0.98750368
## Total_INC      -2.4010144 -2.33886583
##                 2.5184230  1.08475318
## Direct_HOVAL   -0.2768608 -0.27773753
##                 0.1102907  0.09548571
## Indirect_HOVAL  0.1629265  0.15556251
##                 0.7795519  0.29691076
## Total_HOVAL    -0.1139344 -0.12217502
##                 0.8401346  0.33003264

R’s `sessionInfo()`

sessionInfo()

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Fedora 34 (Workstation Edition)
## 
## Matrix products: default
## BLAS:   /home/rsb/topics/R/R410-share/lib64/R/lib/libRblas.so
## LAPACK: /home/rsb/topics/R/R410-share/lib64/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] coda_0.19-4        INLABMA_0.1-11     INLA_21.03.21      foreach_1.5.1     
##  [5] sphet_1.11         ggplot2_3.3.3      R2BayesX_1.1-1     mgcv_1.8-35       
##  [9] nlme_3.1-152       colorspace_2.0-1   BayesXsrc_3.0-1    HSAR_0.5.1        
## [13] hglm_2.2-1         hglm.data_1.0-1    sp_1.4-5           MASS_7.3-54       
## [17] MatrixModels_0.5-0 lme4_1.1-27        spmoran_0.2.1      spatialreg_1.1-9  
## [21] Matrix_1.3-3       spData_0.3.8       tmap_3.3-1         sf_0.9-9          
## 
## loaded via a namespace (and not attached):
##   [1] minqa_1.2.4         leafem_0.1.3        deldir_0.2-10      
##   [4] ellipsis_0.3.2      class_7.3-19        leaflet_2.0.4.1    
##   [7] base64enc_0.1-3     dichromat_2.0-0     proxy_0.4-25       
##  [10] Deriv_4.1.3         farver_2.1.0        mvtnorm_1.1-1      
##  [13] RSpectra_0.16-0     fansi_0.4.2         codetools_0.2-18   
##  [16] splines_4.1.0       mnormt_2.0.2        doParallel_1.0.16  
##  [19] knitr_1.33          spam_2.6-0          jsonlite_1.7.2     
##  [22] nloptr_1.2.2.2      tmaptools_3.1-1     cluster_2.1.2      
##  [25] png_0.1-7           compiler_4.1.0      assertthat_0.2.1   
##  [28] htmltools_0.5.1.1   tools_4.1.0         dotCall64_1.0-1    
##  [31] gtable_0.3.0        glue_1.4.2          dplyr_1.0.6        
##  [34] maps_3.3.0          gmodels_2.18.1      Rcpp_1.0.6         
##  [37] jquerylib_0.1.4     raster_3.4-10       vctrs_0.3.8        
##  [40] spdep_1.1-8         gdata_2.18.0        iterators_1.0.13   
##  [43] leafsync_0.1.0      crosstalk_1.1.1     lwgeom_0.2-6       
##  [46] xfun_0.23           stringr_1.4.0       lifecycle_1.0.0    
##  [49] gtools_3.8.2        XML_3.99-0.6        LearnBayes_2.15.1  
##  [52] scales_1.1.1        expm_0.999-6        metafor_2.4-0      
##  [55] RColorBrewer_1.1-2  fields_12.3         yaml_2.2.1         
##  [58] gridExtra_2.3       sass_0.4.0          stringi_1.6.2      
##  [61] highr_0.9           spDataLarge_0.5.1   e1071_1.7-7        
##  [64] permute_0.9-5       boot_1.3-28         rlang_0.4.11       
##  [67] pkgconfig_2.0.3     matrixStats_0.58.0  evaluate_0.14      
##  [70] lattice_0.20-44     purrr_0.3.4         htmlwidgets_1.5.3  
##  [73] labeling_0.4.2      tidyselect_1.1.1    magrittr_2.0.1     
##  [76] R6_2.5.0            generics_0.1.0      DBI_1.1.1          
##  [79] sn_2.0.0            pillar_1.6.1        withr_2.4.2        
##  [82] units_0.7-1         stars_0.5-2         abind_1.4-5        
##  [85] tibble_3.1.2        crayon_1.4.1        rARPACK_0.11-0     
##  [88] KernSmooth_2.23-20  utf8_1.2.1          tmvnsim_1.0-2      
##  [91] rmarkdown_2.8       viridis_0.6.1       grid_4.1.0         
##  [94] vegan_2.5-7         digest_0.6.27       classInt_0.4-3     
##  [97] numDeriv_2016.8-1.1 dbscan_1.1-8        stats4_4.1.0       
## [100] munsell_0.5.0       viridisLite_0.4.0   bslib_0.2.5.1

Alam, Moudud, Lars Rönnegård, and Xia Shen. 2015. “Fitting Conditional and Simultaneous Autoregressive Spatial Models in Hglm.” The R Journal 7 (2): 5–18. https://doi.org/10.32614/RJ-2015-017.

Bivand, Roger. 2017. “Revisiting the Boston Data Set — Changing the Units of Observation Affects Estimated Willingness to Pay for Clean Air.” REGION 4 (1): 109–27. https://doi.org/10.18335/region.v4i1.107.

Bivand, Roger S., Jan Hauke, and Tomasz Kossowski. 2013. “Computing the Jacobian in Gaussian Spatial Autoregressive Models: An Illustrated Comparison of Available Methods.” Geographical Analysis 45 (2): 150–79. https://doi.org/10.1111/gean.12008.

Bivand, Roger S., Zhe Sha, Liv Osland, and Ingrid Sandvig Thorsen. 2017. “A Comparison of Estimation Methods for Multilevel Models of Spatially Structured Data.” Spatial Statistics. https://doi.org/10.1016/j.spasta.2017.01.002.

Dong, G., R. Harris, K. Jones, and J. Yu. 2015. “Multilevel Modeling with Spatial Interaction Effects with Application to an Emerging Land Market in Beijing, China.” PLOS One 10 (6). https://doi.org/doi:10.1371/journal.pone.0130761.

Dong, Guanpeng, and Richard Harris. 2015. “Spatial Autoregressive Models for Geographically Hierarchical Data Structures.” Geographical Analysis 47 (2): 173–91. https://doi.org/10.1111/gean.12049.

Dray, S., P. Legendre, and P. R. Peres-Neto. 2006. “Spatial Modeling: A Comprehensive Framework for Principle Coordinate Analysis of Neighbor Matrices (PCNM).” Ecological Modelling 196: 483–93.

Dray, Stéphane, David Bauman, Guillaume Blanchet, Daniel Borcard, Sylvie Clappe, Guillaume Guenard, Thibaut Jombart, et al. 2021. Adespatial: Multivariate Multiscale Spatial Analysis. https://CRAN.R-project.org/package=adespatial.

Dray, Stéphane, Pierre Couteron, Marie-Josée Fortin, Pierre Legendre, Pedro R. Peres-Neto, Edwige Bellier, Roger Bivand, et al. 2012. “Community Ecology in the Age of Multivariate Multiscale Spatial Analysis.” Ecological Monographs 82: 257–75.

Goulard, Michel, Thibault Laurent, and Christine Thomas-Agnan. 2017. “About Predictions in Spatial Autoregressive Models: Optimal and Almost Optimal Strategies.” Spatial Economic Analysis 12 (2-3): 304–25. https://doi.org/10.1080/17421772.2017.1300679.

Griffith, D. A. 2010. “Spatial Filtering.” In Handbook of Applied Spatial Analysis, edited by Manfred Fischer and Arthur Getis, 301–18. Heidelberg: Springer.

Griffith, D. A., and J. H. P. Paelinck. 2011. Non-Standard Spatial Statistics and Spatial Econometrics. Berlin: Springer-Verlag.

Griffith, D. A., and P. R. Peres-Neto. 2006. “Spatial Modeling in Ecology: The Flexibility of Eigenfunction Spatial Analyses.” Ecology 87: 2603–13.

Griffith, Daniel A. 2003. Spatial Autocorrelation and Spatial Filtering. New York: Springer.

Murakami, Daisuke. 2021. Spmoran: Moran Eigenvector-Based Scalable Spatial Additive Mixed Models. https://CRAN.R-project.org/package=spmoran.

Murakami, Daisuke, and Daniel A. Griffith. 2019. “Eigenvector Spatial Filtering for Large Data Sets: Fixed and Random Effects Approaches.” Geographical Analysis 51 (1): 23–49. https://doi.org/https://doi.org/10.1111/gean.12156.

Pace, RK, and JP LeSage. 2008. “A Spatial Hausman Test.” Economics Letters 101: 282–84.

Patuelli, Roberto, Norbert Schanne, Daniel A. Griffith, and Peter Nijkamp. 2012. “Persistence of Regional Unemployment: Application of a Spatial Filtering Approach to Local Labor Markets in Germany.” Journal of Regional Science 52 (2): 300–323. https://doi.org/10.1111/j.1467-9787.2012.00759.x.

Piras, Gianfranco. 2010. “sphet: Spatial Models with Heteroskedastic Innovations in R.” Journal of Statistical Software 35 (1): 1–21.

Tiefelsdorf, M., and D. A. Griffith. 2007. “Semiparametric Filtering of Spatial Autocorrelation: The Eigenvector Approach.” Environment and Planning A 39: 1193–1221.

Umlauf, Nikolaus, Daniel Adler, Thomas Kneib, Stefan Lang, and Achim Zeileis. 2015. “Structured Additive Regression Models: An R Interface to BayesX.” Journal of Statistical Software 63 (21): 1–46. http://www.jstatsoft.org/v63/i21/.

Waller, Lance A., and Carol A. Gotway. 2004. Applied Spatial Statistics for Public Health Data. Hoboken, NJ: John Wiley & Sons.

Wood, S. N. 2017. Generalized Additive Models: An Introduction with r. 2nd ed. Chapman; Hall/CRC.

Introduction to R (II)

Roger Bivand

Tuesday 15 June 2021

Copyright

Required current contributed CRAN packages:

Script

Seminar introduction

Schedule

Spatial filtering

Upper NY data

NC SIDS data

Markov random field and multilevel models with spatial weights

Boston data set

NC SIDS data

Conditional and MRF autoregressive approaches

US Census ACS dataset

Beijing leased residential land parcels data set (2003-2009)

Parcel-level spatial weights

Formula objects

Using fitted model objects

Same framework with spatial models

SLX residual autocorrelation

Impacts: local and global

Maximum likelihood — error model

GMM estimation of SEM

ML Durbin SEM

SDEM impacts

Maximum likelihood spatial lag model

SAR impacts

GMM estimation of SAR

GMM SAR impacts

Spatial lag Durbin model

Tests between SLX, SDM and SDEM

Tests between SLX, SDM and SDEM

SDM impacts

Impacts by simple linear algebra

Total impacts by incremented prediction

MCMC spatial Durbin fitting

MCMC impacts

Total impact of log(area)

General Nested Model (GNM)

Impacts

Tests between GNM, SDM and SDEM

Bayesian spatial econometrics

Columbus toy data

Simulation framework

Check Durbin

Try hglm

HGLM SEM

ML SEM

SET SEM (griddy Gibbs I)

SET SEM (griddy Gibbs II)

SET SEM (MH)

INLA setup

INLA_slm SEM setup

INLA_slm SEM

SEM coefficients and SEs

SDEM ML

SDEM SET (griddy Gibbs I)

SDEM SET (griddy Gibbs II)

SDEM SET (MH)

SDEM INLA_slm

SDEM coefficients

SDEM impacts

SLM ML

SLM SET (griddy Gibbs)

SLM SET (MH)

SLM INLA_slm

SLM coefficients

SLM impacts

SDM ML

SDM SET (griddy Gibbs)

SDM SET (MH)

SDM INLA_slm

SDM coefficients

SDM impacts

SAC ML

SAC SET (MH)

SAC coefficients

SAC impacts

R’s `sessionInfo()`