8 Spatial autoregressive models: conditional (CAR) and simultaneous (SAR)

8.1 CAR and SAR

(from: Pebesma and Bivand (2023), pp. 233-234)

There is a large literature in disease mapping using conditional autoregressive (CAR) and intrinsic CAR (ICAR) models in spatially structured random effects. These extend to multilevel models, in which the spatially structured random effects may apply at different levels of the model (Bivand et al. 2017). In order to try out some of the variants, we need to remove the no-neighbour observations from the tract level, and from the model output zone aggregated level, in two steps as reducing the tract level induces a no-neighbour outcome at the model output zone level. Many of the model estimating functions take family arguments, and fit generalised linear mixed effects models with per-observation spatial random effects structured using a Markov random field representation of relationships between neighbours. In the multilevel case, the random effects may be modelled at the group level, which is the case presented in the following examples.

We follow Gómez-Rubio (2019) in summarising Pinheiro and Bates (2000) and McCulloch and Searle (2001) to describe the mixed-effects model representation of spatial regression models. In a Gaussian linear mixed model setting, a random effect u is added to the model, with response Y, fixed covariates X, their coefficients \beta and error term \varepsilon_i \sim N(0, \sigma^2), i=1,\dots, n:

Y = X \beta + Z u + \varepsilon

Z is a fixed design matrix for the random effects. If there are n random effects, it will be an n \times n identity matrix if instead the observations are aggregated into m groups, so with m < n random effects, it will be an n \times m matrix showing which group each observation belongs to. The random effects are modelled as a multivariate Normal distribution u \sim N(0, \sigma^2_u \Sigma), and \sigma^2_u \Sigma is the square variance-covariance matrix of the random effects.

A division has grown up, possibly unhelpfully, between scientific fields using CAR models (Besag 1974), and simultaneous autoregressive models (SAR) (Ord 1975; Hepple 1976). Although CAR and SAR models are closely related, these fields have found it difficult to share experience of applying similar models, often despite referring to key work summarising the models (Ripley 1981, 1988; Cressie 1993). Ripley gives the SAR variance as (Ripley 1981, 89), here shown as the inverse \Sigma^{-1} (also known as the precision matrix):

\Sigma^{-1} = [(I - \rho W)^\top(I - \rho W)]

where \rho is a spatial autocorrelation parameter and W is a non-singular spatial weights matrix that represents spatial dependence. The CAR variance is:

\Sigma^{-1} = (I - \rho W)

where W is a symmetric and strictly positive definite spatial weights matrix. In the case of the intrinsic CAR model, avoiding the estimation of a spatial autocorrelation parameter, we have:

\Sigma^{-1} = M = \mathrm{diag}(n_i) - W

where W is a symmetric and strictly positive definite spatial weights matrix as before and n_i are the row sums of W. The Besag-York-Mollié model includes intrinsic CAR spatially structured random effects and unstructured random effects. The Leroux model combines matrix components for unstructured and spatially structured random effects, where the spatially structured random effects are taken as following an intrinsic CAR specification:

\Sigma^{-1} = [(1 - \rho) I_n + \rho M]

References to the definitions of these models may be found in Gómez-Rubio (2020), and estimation issues affecting the Besag-York-Mollié and Leroux models are reviewed by Gerber and Furrer (2015).

More recent books expounding the theoretical bases for modelling with areal data simply point out the similarities between SAR and CAR models in relevant chapters (Gaetan and Guyon 2010; Lieshout 2019); the interested reader is invited to consult these sources for background information.

8.2 Some CAR examples with English garbage data

In spatial econometrics, CAR is not used, nor are binary spatial weights; they both are, however, extensively used in spatial epidemiology (disease mapping). The CAR model accounts for spatial autocorrelation in the residuals of a linear model by modelling that residual dependency in various ways. spatialreg provides spautolm using Maximum Likelihood (ML) including a "CAR" family member, but does not have a direct method for extracting the spatially structured random effect (SSRE) from the fitted model:

library(spdep)
lwB <- nb2listw(unb, style="B")
library(spatialreg)
car1 <- spautolm(form_pre, data=eng324, listw=lwB, family="CAR")
summary(car1, Nagelkerke=TRUE)


Call: spautolm(formula = form_pre, data = eng324, listw = lwB, family = "CAR")

Residuals:
       Min         1Q     Median         3Q        Max 
-1.2458705 -0.1271315 -0.0082244  0.1395717  0.9975448 

Coefficients: 
                 Estimate Std. Error z value  Pr(>|z|)
(Intercept)    -5.2940330  1.4601452 -3.6257 0.0002882
log(units)      0.9890709  0.0374943 26.3793 < 2.2e-16
house          -1.7376836  0.3188845 -5.4493 5.058e-08
log(dens)       0.0032286  0.0129677  0.2490 0.8033847
Metroplondon    0.1181653  0.0765654  1.5433 0.1227517
Metropmetrop    0.0032977  0.0648167  0.0509 0.9594235
log(realWgPre)  0.5710435  0.2692001  2.1213 0.0338999

Lambda: 0.12806 LR test value: 20.119 p-value: 7.2757e-06 
Numerical Hessian standard error of lambda: 0.01965 

Log likelihood: 10.54415 
ML residual variance (sigma squared): 0.051948, (sigma: 0.22792)
Number of observations: 324 
Number of parameters estimated: 9 
AIC: -3.0883
Nagelkerke pseudo-R-squared: 0.84639

Cressie (1993) gives the spatial component of the fit as \rho_{\mathrm{Err}} {\mathbf W} ({\mathbf y} - {\mathbf X}{\mathbf \beta}) (p. 564, eq. 7.6.31):

eng324$ranef_ml <- car1$fit$signal_stochastic

Hierarchical GLM hglm (Alam, Rönnegård, and Shen 2015) can be used for fitting a proper CAR model as a rand.family argument passing in the binary spatial weights coerced into a symmetric sparse matrix:

library(hglm)
W <- as(lwB, "CsparseMatrix")
car2 <- hglm(fixed=form_pre, random= ~ 1|ID, data=eng324, family = gaussian(), rand.family=CAR(D=W))
car2

Call: 
hglm.formula(family = gaussian(), rand.family = CAR(D = W), fixed = form_pre, 
    random = ~1 | ID, data = eng324)

---------------------------
Estimates of the mean model
---------------------------

Fixed effects:
   (Intercept)     log(units)          house      log(dens)   Metroplondon 
  -5.256205728    0.982604641   -1.740287615    0.006107953    0.109398222 
  Metropmetrop log(realWgPre) 
   0.015926892    0.572707449 

Random effects:
[1]  0.07512008 -0.01960200  0.14569011
...
[1] 0.11309920 0.05048601
NOTE: to show all the random effects estimates, use print(hglm.object, print.ranef = TRUE).

Dispersion parameter for the mean model: 0.03210403 

Dispersion parameter for the random effects: 3.83044e+21 

Estimation did not converge in 3 iterations

!! Observation 65 is too influential! Estimates are likely unreliable !!

The fitting algorithm does not converge, so no estimate of the spatial autoregressive parameter is returned; a warning is given that one observation is highly unusual:

as.character(eng324$NAME[65])

[1] "CITY OF LONDON"

As almost nobody lives there, the proportion of houses in pick-up points is only 0.21, far below the median of 0.93; other variables are less extreme. No provision is made for dropping the outlier, so we will keep it; the SSRE is provided:

eng324$ranef_hglm <- c(unname(car2$ranef))

Finally, an intrinsic CAR (here termed Markov Random Field, MRF) can be fitted with mgcv using gam with an "mrf" smooth, and a spatial neighbour object with names set on the object components to match the variable defining the random effects:

library(mgcv)

Loading required package: nlme

This is mgcv 1.9-1. For overview type 'help("mgcv-package")'.

names(unb) <- attr(unb, "region.id")
eng324$NAME <- factor(eng324$NAME)
car3 <- gam(update(form_pre, . ~ . + s(NAME, bs="mrf", xt=list(nb=unb))), data=eng324)
summary(car3)


Family: gaussian 
Link function: identity 

Formula:
log(realNetPre) ~ log(units) + house + log(dens) + Metrop + log(realWgPre) + 
    s(NAME, bs = "mrf", xt = list(nb = unb))

Parametric coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -4.579478   1.778786  -2.574   0.0108 *  
log(units)      1.029406   0.037874  27.180  < 2e-16 ***
house          -1.916974   0.325147  -5.896 1.71e-08 ***
log(dens)      -0.005621   0.012749  -0.441   0.6598    
Metroplondon   -0.018417   0.089088  -0.207   0.8365    
Metropmetrop   -0.121942   0.068941  -1.769   0.0786 .  
log(realWgPre)  0.403762   0.327662   1.232   0.2194    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
        edf Ref.df     F p-value    
s(NAME) 130    319 1.057  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.918   Deviance explained = 95.2%
GCV = 0.051034  Scale est. = 0.029448  n = 324

SSRE are not returned directly, but can be extracted from predictions for the data used to fit the model, taking only the smooth term:

eng324$ranef_mrf <- unname(predict(car3, type="terms")[, "s(NAME)"])

We can tabulate the fixed effects coefficients, and see that these fitting approaches yield similar but not identical values:

cbind(ML=coef(car1), HGLM=c(car2$fixef, NA), MRF=c(coef(car3)[1:7], NA))

                         ML         HGLM          MRF
(Intercept)    -5.294032977 -5.256205728 -4.579477921
log(units)      0.989070893  0.982604641  1.029405939
house          -1.737683649 -1.740287615 -1.916973856
log(dens)       0.003228567  0.006107953 -0.005621393
Metroplondon    0.118165294  0.109398222 -0.018416579
Metropmetrop    0.003297685  0.015926892 -0.121942154
log(realWgPre)  0.571043491  0.572707449  0.403761668
lambda          0.128061085           NA           NA

We can also look at correlations between the three approximations to SSRE:

cor(st_drop_geometry(eng324)[, c("ranef_ml", "ranef_hglm", "ranef_mrf")])

            ranef_ml ranef_hglm ranef_mrf
ranef_ml   1.0000000  0.6203440 0.6620203
ranef_hglm 0.6203440  1.0000000 0.9297771
ranef_mrf  0.6620203  0.9297771 1.0000000

It is characteristic of work in disease mapping that the SSRE are mapped, and the maps are interpreted to learn whether the data collection could have been improved. It is not usual that work in spatial econometrics includes maps, and this difference has been noted critically by spatial statisticians. Here are comparative maps of the three output sets of SSRE:

library(tmap)

Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
remotes::install_github('r-tmap/tmap')

tm_shape(eng324) + tm_fill(c("ranef_ml", "ranef_hglm", "ranef_mrf"), title="random effect", n=9, midpoint=0, style="fisher") + tm_borders() + tm_facets(free.scales=FALSE) + tm_layout(panel.labels=c("ML", "HGLM", "MRF"))

Maps of spatially structured random effects, conditional autoregressive models

8.3 Simultaneous autoregressive models

There are many varieties of model fitting methods for CAR models, fewer for SAR. We can begin by using spautolm using Maximum Likelihood (ML) again, choosing SAR rather than CAR, but initially retaining the binary spatial weights:

sar1B <- spautolm(form_pre, data=eng324, listw=lwB, family="SAR")
summary(sar1B, Nagelkerke=TRUE)


Call: spautolm(formula = form_pre, data = eng324, listw = lwB, family = "SAR")

Residuals:
       Min         1Q     Median         3Q        Max 
-1.1973371 -0.1330977  0.0042846  0.1472354  1.0226467 

Coefficients: 
                 Estimate Std. Error z value  Pr(>|z|)
(Intercept)    -5.2182355  1.4520462 -3.5937  0.000326
log(units)      0.9881039  0.0375060 26.3452 < 2.2e-16
house          -1.7196249  0.3196133 -5.3803 7.435e-08
log(dens)       0.0029204  0.0129370  0.2257  0.821401
Metroplondon    0.1254203  0.0767156  1.6349  0.102076
Metropmetrop    0.0093832  0.0648807  0.1446  0.885009
log(realWgPre)  0.5562152  0.2681111  2.0746  0.038026

Lambda: 0.073558 LR test value: 19.892 p-value: 8.1948e-06 
Numerical Hessian standard error of lambda: 0.015158 

Log likelihood: 10.4304 
ML residual variance (sigma squared): 0.053275, (sigma: 0.23081)
Number of observations: 324 
Number of parameters estimated: 9 
AIC: -2.8608
Nagelkerke pseudo-R-squared: 0.84628

As row-standardised weights are much more common in spatial econometrics, let us switch (back) to them. First we see that the domain of the spatial coefficient depends on the spatial weights, and is the inverse of the range of the eigenvalues of the spatial weights matrix; this applies irrespective of the estimation method:

1/range(eigenw(lwB))

[1] -0.3129207  0.1666602

lw <- nb2listw(unb, style="W")
e <- eigenw(lw)
1/range(e)

[1] -1.224561  1.000000

The value of the single spatial parameter is found by line search (univariate optimisation). In spautolm, the objective function includes the use of numerical inversion of the {\mathbf X}^\top{\mathbf A}{\mathbf X} matrix, where in the CAR case, {\mathbf A} = {\mathbf I} - \rho{\mathbf W}, but in the SAR case, it is {\mathbf A} = ({\mathbf I} - \rho{\mathbf W})^\top({\mathbf I} - \rho{\mathbf W}), using plug-in functions. Using spautolm, we get:

sar1 <- spautolm(form_pre, data=eng324, listw=lw, family="SAR", control=list(pre_eig=e))
summary(sar1, Nagelkerke=TRUE)


Call: spautolm(formula = form_pre, data = eng324, listw = lw, family = "SAR", 
    control = list(pre_eig = e))

Residuals:
       Min         1Q     Median         3Q        Max 
-1.2099474 -0.1382060 -0.0061752  0.1467393  1.0037329 

Coefficients: 
                  Estimate  Std. Error z value  Pr(>|z|)
(Intercept)    -4.99283719  1.49150170 -3.3475 0.0008154
log(units)      0.99637181  0.03760832 26.4934 < 2.2e-16
house          -1.73716260  0.31817206 -5.4598 4.766e-08
log(dens)       0.00052967  0.01256022  0.0422 0.9663626
Metroplondon    0.10864174  0.07912775  1.3730 0.1697549
Metropmetrop   -0.00633204  0.06608941 -0.0958 0.9236713
log(realWgPre)  0.50480257  0.27573061  1.8308 0.0671331

Lambda: 0.35401 LR test value: 19.988 p-value: 7.7912e-06 
Numerical Hessian standard error of lambda: 0.072223 

Log likelihood: 10.47868 
ML residual variance (sigma squared): 0.05332, (sigma: 0.23091)
Number of observations: 324 
Number of parameters estimated: 9 
AIC: -2.9574
Nagelkerke pseudo-R-squared: 0.84632

spautolm outputs the standard error of the spatial coefficient based on the numerical Hessian to estimate the information matrix of the model. errorsarlm was written before spautolm, and yields very similar values, using for moderate n as in this case asymptotic standard errors:

SEM_pre <- errorsarlm(form_pre, data=eng324, listw=lw, control=list(pre_eig=e))
summary(SEM_pre, Nagelkerke=TRUE)


Call:
errorsarlm(formula = form_pre, data = eng324, listw = lw, control = list(pre_eig = e))

Residuals:
       Min         1Q     Median         3Q        Max 
-1.2099474 -0.1382060 -0.0061752  0.1467393  1.0037330 

Type: error 
Coefficients: (asymptotic standard errors) 
                  Estimate  Std. Error z value  Pr(>|z|)
(Intercept)    -4.99283675  1.49150136 -3.3475 0.0008154
log(units)      0.99637177  0.03760832 26.4934 < 2.2e-16
house          -1.73716251  0.31817207 -5.4598 4.766e-08
log(dens)       0.00052969  0.01256022  0.0422 0.9663618
Metroplondon    0.10864178  0.07912773  1.3730 0.1697547
Metropmetrop   -0.00633193  0.06608941 -0.0958 0.9236726
log(realWgPre)  0.50480253  0.27573055  1.8308 0.0671331

Lambda: 0.35401, LR test value: 19.988, p-value: 7.7912e-06
Asymptotic standard error: 0.069512
    z-value: 5.0928, p-value: 3.5288e-07
Wald statistic: 25.936, p-value: 3.5288e-07

Log likelihood: 10.47868 for error model
ML residual variance (sigma squared): 0.05332, (sigma: 0.23091)
Nagelkerke pseudo-R-squared: 0.84632 
Number of observations: 324 
Number of parameters estimated: 9 
AIC: -2.9574, (AIC for lm: 15.031)

It avoids some loss of numerical precision in calculation of the objective function, so the coefficients are equal after moderate rounding:

all.equal(coef(SEM_pre)[c(2:8, 1)], coef(sar1), tol=2e-07)

[1] TRUE

In addition, output from errorsarlm offers the Hausman test (Pace and LeSage 2008); the SEM and OLS regression coefficients should be close to each other and significant divergence indicates model mis-specification; it seems difficult here to accept the null hypothesis of no difference:

Hausman.test(SEM_pre)


    Spatial Hausman test (asymptotic)

data:  NULL
Hausman test = 21.049, df = 7, p-value = 0.003699

8.4 Spatial econometrics models: pre-CCT examples

We have already fitted the SEM model using ML to the basic pre-CCT formula, partly based on our initial results, indicating that SEM was to be preferred to SLM, but without considering the inclusion of {\mathbf W}{\mathbf X}.

Use of Rao’s score tests only involve the fitting of least squares models, both OLS and SLX, but it is also possible to work back from the most complex model GNM, recalling its identification issues (using ML):

GNM_pre <- sacsarlm(form_pre, data=eng324, listw=lw, Durbin=update(form_pre, ~ . - Metrop), control=list(pre_eig1=e, pre_eig2=e))
SAC_pre <- sacsarlm(form_pre, data=eng324, listw=lw, control=list(pre_eig1=e, pre_eig2=e))

However, since GNM includes \rho_{\mathrm{Lag}} {\mathbf W}{\mathbf y}, presenting the conventional table of coefficient estimates is not appropriate, so we’ll stay with likelihood ratio tests, which depend on the models being compared being nested, and returning log-likelihood values:

\mathrm{LR} = -2 (\ell(\theta_0) - \ell(\theta_1))

where \ell(\theta_0) is the log-likelihood of the nested model, and \ell(\theta_1) of the encompassing model, and where the number of degrees of freedom for \chi^2 is the difference in the estimated parameter counts.

library(lmtest)

Loading required package: zoo


Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

lrtest(GNM_pre, SAC_pre)

Likelihood ratio test

Model 1: log(realNetPre) ~ log(units) + house + log(dens) + Metrop + log(realWgPre)
Model 2: log(realNetPre) ~ log(units) + house + log(dens) + Metrop + log(realWgPre)
  #Df LogLik Df  Chisq Pr(>Chisq)  
1  14 17.361                       
2  10 10.887 -4 12.947    0.01154 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This test indicates that including {\mathbf W}{\mathbf X} to the SAC model to give an GNM model only yields a minor improvement in fit. Moving on, we fit the remaining models:

SDEM_pre <- errorsarlm(form_pre, data=eng324, listw=lw, Durbin=update(form_pre, ~ . - Metrop), control=list(pre_eig=e))

Hausman.test(SDEM_pre)


    Spatial Hausman test (asymptotic)

data:  NULL
Hausman test = 19.012, df = 11, p-value = 0.06087

The Hausman test is now borderline, so the coefficients of the SDEM may be taken as close to those of the SLX model. Moving on:

SDM_pre <- lagsarlm(form_pre, data=eng324, listw=lw, Durbin=update(form_pre, ~ . - Metrop), control=list(pre_eig=e))

c(LM_test=signif(SDM_pre$LMtest), p.value=format.pval((1 - pchisq(SDM_pre$LMtest, 1))))

    LM_test     p.value 
  "8.61387" "0.0033361"

The Rao’s score (Lagrange Multiplier) test (Anselin 1988) for residual spatial autocorrelation in the SDM model shows that the model is mis-specified.

SLM_pre <- lagsarlm(form_pre, data=eng324, listw=lw, control=list(pre_eig=e))

c(LM_test=signif(SLM_pre$LMtest), p.value=format.pval((1 - pchisq(SLM_pre$LMtest, 1))))

   LM_test    p.value 
  "6.3742" "0.011579"

The LM test for residual spatial autocorrelation in the SLM model indicates that something in the {\mathbf W}{\mathbf X} is adding spatial autocorrelation to the residuals.

SLX_pre <- lmSLX(form_pre, data=eng324, listw=lw, Durbin=update(form_pre, ~ . - Metrop))

OLS_pre <- lm(form_pre, data=eng324)

we now have eight fitted pre-CCT models, and can calculate the likelihood ratio tests between nested models. GNM nests all models, SAC nests SEM, SLM and OLS, SDEM nests SEM, SLX and OLS, SDM nests SLM, SLX and OLS (and SEM through the Common Factor \rho_{\mathrm{Err}} = - \rho_{\mathrm{Lag}}{\mathbf \beta}), and SEM, SLX and SLM nest OLS:

mlist_pre <- list(GNM=GNM_pre, SAC=SAC_pre, SDEM=SDEM_pre, SDM=SDM_pre, SLX=SLX_pre, SEM=SEM_pre, SLM=SLM_pre, OLS=OLS_pre)
m <- length(mlist_pre)
LR_pre <- matrix(NA, ncol=m, nrow=m)
colnames(LR_pre) <- rownames(LR_pre) <- names(mlist_pre)
for (j in 2:m) LR_pre[1, j] <- lrtest(mlist_pre[[1]],
    mlist_pre[[j]])[[2, "Pr(>Chisq)"]]
for (j in 6:8) LR_pre[2, j] <- lrtest(mlist_pre[[2]],
    mlist_pre[[j]])[[2, "Pr(>Chisq)"]]
for (j in c(5, 6, 8)) LR_pre[3, j] <- lrtest(mlist_pre[[3]],
    mlist_pre[[j]])[[2, "Pr(>Chisq)"]]
for (j in c(5:8)) LR_pre[4, j] <- lrtest(mlist_pre[[4]],
    mlist_pre[[j]])[[2, "Pr(>Chisq)"]]
for (i in 5:7) LR_pre[i, 8] <- lrtest(mlist_pre[[i]],
    mlist_pre[[8]])[[2, "Pr(>Chisq)"]]

library(plot.matrix)
library(viridis)

Loading required package: viridisLite

opar <- par(mar=c(3.1, 3.1, 3.1, 7.1), cex.axis=0.8)
plot(LR_pre, breaks=c(0, 10^-(8:0)), col=viridis, las=1)

Heat map of likelihood ratio p-values, pre_CCT models

par(opar)

The figure shows that the only comparisons indicating minor improvement in fit by making the model more complex are from SEM to SDEM or SAC; it is not clear that adding the {\mathbf W}{\mathbf X} helps. In addition, the likelihood ratio does not penalise for the increased number of fitted paramters, unlike AIC and BIC, which penalise in increasing degree. The Akaike information criterion (AIC) is, where k is the parameter count:

\mathrm{AIC} = 2 k - 2 \ell(\theta)

sort(sapply(mlist_pre, AIC))

      GNM       SEM       SDM       SAC      SDEM       SLM       SLX       OLS 
-6.722113 -2.957365 -2.742996 -1.774990 -1.102252  3.887504 13.872713 15.031060

The BIC scores put all of the models including {\mathbf W}{\mathbf X} behind those without, pointing to SEM, followed by SAC and SLM. The Bayesian information criterion (BIC) is, for n, the observation count:

\mathrm{BIC} = k \ln(n) - 2 \ell(\theta)

sort(sapply(mlist_pre, BIC))

     SEM      SAC      SLM      OLS      GNM      SDM     SDEM      SLX 
31.06933 36.03245 37.91420 45.27701 46.20830 46.40667 48.04741 59.24163

8.5 Spatial econometrics models: pre-CCT examples including political control

SEM_pre_maj <- errorsarlm(form_pre_maj, data=eng324, listw=lw, control=list(pre_eig=e))

Hausman.test(SEM_pre_maj)


    Spatial Hausman test (asymptotic)

data:  NULL
Hausman test = 24.487, df = 9, p-value = 0.003594

GNM_pre_maj <- sacsarlm(form_pre_maj, data=eng324, listw=lw, Durbin=update(form_pre_maj, ~ . - Metrop - Majority), control=list(pre_eig1=e, pre_eig2=e))
SAC_pre_maj <- sacsarlm(form_pre_maj, data=eng324, listw=lw, control=list(pre_eig1=e, pre_eig2=e))

SDEM_pre_maj <- errorsarlm(form_pre_maj, data=eng324, listw=lw, Durbin=update(form_pre_maj, ~ . - Metrop - Majority), control=list(pre_eig=e))

Hausman.test(SDEM_pre_maj)


    Spatial Hausman test (asymptotic)

data:  NULL
Hausman test = 21.775, df = 13, p-value = 0.05894

SDM_pre_maj <- lagsarlm(form_pre_maj, data=eng324, listw=lw, Durbin=update(form_pre_maj, ~ . - Metrop - Majority), control=list(pre_eig=e))

c(LM_test=signif(SDM_pre_maj$LMtest), p.value=format.pval((1 - pchisq(SDM_pre_maj$LMtest, 1))))

     LM_test      p.value 
   "15.7135" "7.3694e-05"

SLM_pre_maj <- lagsarlm(form_pre_maj, data=eng324, listw=lw, control=list(pre_eig=e))

c(LM_test=signif(SLM_pre_maj$LMtest), p.value=format.pval((1 - pchisq(SLM_pre_maj$LMtest, 1))))

  LM_test   p.value 
"1.03321" "0.30941"

SLX_pre_maj <- lmSLX(form_pre_maj, data=eng324, listw=lw, Durbin=update(form_pre_maj, ~ . - Metrop - Majority))

OLS_pre_maj <- lm(form_pre_maj, data=eng324)

mlist_pre_maj <- list(GNM=GNM_pre_maj, SAC=SAC_pre_maj, SDEM=SDEM_pre_maj, SDM=SDM_pre_maj, SLX=SLX_pre_maj, SEM=SEM_pre_maj, SLM=SLM_pre_maj, OLS=OLS_pre_maj)
m <- length(mlist_pre_maj)
LR_pre_maj <- matrix(NA, ncol=m, nrow=m)
colnames(LR_pre_maj) <- rownames(LR_pre_maj) <- names(mlist_pre_maj)
for (j in 2:m) LR_pre_maj[1, j] <- lrtest(mlist_pre_maj[[1]],
    mlist_pre_maj[[j]])[[2, "Pr(>Chisq)"]]
for (j in 6:8) LR_pre_maj[2, j] <- lrtest(mlist_pre_maj[[2]],
    mlist_pre_maj[[j]])[[2, "Pr(>Chisq)"]]
for (j in c(5, 6, 8)) LR_pre_maj[3, j] <- lrtest(mlist_pre_maj[[3]],
    mlist_pre_maj[[j]])[[2, "Pr(>Chisq)"]]
for (j in c(5:8)) LR_pre_maj[4, j] <- lrtest(mlist_pre_maj[[4]],
    mlist_pre_maj[[j]])[[2, "Pr(>Chisq)"]]
for (i in 5:7) LR_pre_maj[i, 8] <- lrtest(mlist_pre_maj[[i]],
    mlist_pre_maj[[8]])[[2, "Pr(>Chisq)"]]

opar <- par(mar=c(3.1, 3.1, 3.1, 7.1), cex.axis=0.8)
plot(LR_pre_maj, breaks=c(0, 10^-(8:0)), col=viridis, las=1)

Heat map of likelihood ratio p-values, pre_CCT models including politiacl control

par(opar)

sort(sapply(mlist_pre_maj, AIC))

      GNM       SDM      SDEM       SLX       SLM       SAC       SEM       OLS 
-38.29440 -29.30102 -23.60534 -21.31938 -20.13818 -19.52201 -18.43801 -11.04604

sort(sapply(mlist_pre_maj, BIC))

     SLM      GNM      SEM      SAC      OLS      SDM      SLX     SDEM 
21.45000 22.19749 23.15016 25.84692 26.76140 27.41013 31.61103 33.10581

8.6 Spatial econometrics models: post-CCT examples including political control

SEM_post_maj <- errorsarlm(form_post_maj, data=eng324, listw=lw, control=list(pre_eig=e))

Hausman.test(SEM_post_maj)


    Spatial Hausman test (asymptotic)

data:  NULL
Hausman test = 21.981, df = 9, p-value = 0.008938

GNM_post_maj <- sacsarlm(form_post_maj, data=eng324, listw=lw, Durbin=update(form_post_maj, ~ . - Metrop - Majority), control=list(pre_eig1=e, pre_eig2=e))
SAC_post_maj <- sacsarlm(form_post_maj, data=eng324, listw=lw, control=list(pre_eig1=e, pre_eig2=e))

SDEM_post_maj <- errorsarlm(form_post_maj, data=eng324, listw=lw, Durbin=update(form_post_maj, ~ . - Metrop - Majority), control=list(pre_eig=e))

Hausman.test(SDEM_post_maj)


    Spatial Hausman test (asymptotic)

data:  NULL
Hausman test = 15.783, df = 13, p-value = 0.261

SDM_post_maj <- lagsarlm(form_post_maj, data=eng324, listw=lw, Durbin=update(form_post_maj, ~ . - Metrop - Majority), control=list(pre_eig=e))

c(LM_test=signif(SDM_post_maj$LMtest), p.value=format.pval((1 - pchisq(SDM_post_maj$LMtest, 1))))

   LM_test    p.value 
 "5.46586" "0.019391"

SLM_post_maj <- lagsarlm(form_post_maj, data=eng324, listw=lw, control=list(pre_eig=e))

c(LM_test=signif(SLM_post_maj$LMtest), p.value=format.pval((1 - pchisq(SLM_post_maj$LMtest, 1))))

  LM_test   p.value 
"1.89042" "0.16916"

SLX_post_maj <- lmSLX(form_post_maj, data=eng324, listw=lw, Durbin=update(form_post_maj, ~ . - Metrop - Majority))

OLS_post_maj <- lm(form_post_maj, data=eng324)

mlist_post_maj <- list(GNM=GNM_post_maj, SAC=SAC_post_maj, SDEM=SDEM_post_maj, SDM=SDM_post_maj, SLX=SLX_post_maj, SEM=SEM_post_maj, SLM=SLM_post_maj, OLS=OLS_post_maj)
m <- length(mlist_post_maj)
LR_post_maj <- matrix(NA, ncol=m, nrow=m)
colnames(LR_post_maj) <- rownames(LR_post_maj) <- names(mlist_post_maj)
for (j in 2:m) LR_post_maj[1, j] <- lrtest(mlist_post_maj[[1]],
    mlist_post_maj[[j]])[[2, "Pr(>Chisq)"]]
for (j in 6:8) LR_post_maj[2, j] <- lrtest(mlist_post_maj[[2]],
    mlist_post_maj[[j]])[[2, "Pr(>Chisq)"]]
for (j in c(5, 6, 8)) LR_post_maj[3, j] <- lrtest(mlist_post_maj[[3]],
    mlist_post_maj[[j]])[[2, "Pr(>Chisq)"]]
for (j in c(5:8)) LR_post_maj[4, j] <- lrtest(mlist_post_maj[[4]],
    mlist_post_maj[[j]])[[2, "Pr(>Chisq)"]]
for (i in 5:7) LR_post_maj[i, 8] <- lrtest(mlist_post_maj[[i]],
    mlist_post_maj[[8]])[[2, "Pr(>Chisq)"]]

opar <- par(mar=c(3.1, 3.1, 3.1, 7.1), cex.axis=0.8)
plot(LR_post_maj, breaks=c(0, 10^-(8:0)), col=viridis, las=1)

Heat map of likelihood ratio p-values, post_CCT models including political control

par(opar)

sort(sapply(mlist_post_maj, AIC))

      GNM       SDM      SDEM       SLX       SEM       SAC       SLM       OLS 
-65.97796 -63.54597 -59.60015 -56.38760 -54.05232 -53.87685 -53.76302 -48.47510

sort(sapply(mlist_post_maj, BIC))

       SEM        SLM        OLS        SAC        SDM        GNM        SLX 
-12.464138 -12.174841 -10.667662  -8.507931  -6.834818  -5.486068  -3.457190 
      SDEM 
 -2.889000

So we might reasonably choose to proceed with SLM including political control, but will have to wait to present coefficient estimates, or rather the marginal effects, because SLM includes \rho_{\mathrm{Lag}} {\mathbf W}{\mathbf y}.

Alam, Moudud, Lars Rönnegård, and Xia Shen. 2015. “Fitting Conditional and Simultaneous Autoregressive Spatial Models in Hglm.” The R Journal 7 (2): 5–18. https://doi.org/10.32614/RJ-2015-017.

Anselin, Luc. 1988. “Lagrange Multiplier Test Diagnostics for Spatial Dependence and Spatial Heterogeneity.” Geographical Analysis 20 (1): 1–17. https://doi.org/10.1111/j.1538-4632.1988.tb00159.x.

Besag, Julian. 1974. “Spatial Interaction and the Statistical Analysis of Lattice Systems.” Journal of the Royal Statistical Society. Series B (Methodological) 36: pp. 192–236.

Bivand, Roger, Zhe Sha, Liv Osland, and Ingrid Sandvig Thorsen. 2017. “A Comparison of Estimation Methods for Multilevel Models of Spatially Structured Data.” Spatial Statistics. https://doi.org/10.1016/j.spasta.2017.01.002.

Cressie, Noel A. C. 1993. Statistics for Spatial Data. New York: John Wiley & Sons.

Gaetan, Carlo, and Xavier Guyon. 2010. Spatial Statistics and Modeling. New York: Springer.

Gerber, Florian, and Reinhard Furrer. 2015. “Pitfalls in the Implementation of Bayesian Hierarchical Modeling of Areal Count Data: An Illustration Using BYM and Leroux Models.” Journal of Statistical Software, Code Snippets 63 (1): 1–32. https://doi.org/10.18637/jss.v063.c01.

Gómez-Rubio, Virgilio. 2019. “Spatial Data Analysis with INLA. Coding Club UC3M Tutorial Series. Universidad Carlos III de Madrid.” https://codingclubuc3m.rbind.io/talk/2019-11-05/.

———. 2020. Bayesian Inference with INLA. Boca Raton, FL: CRC Press.

Hepple, Leslie W. 1976. “A Maximum Likelihood Model for Econometric Estimation with Spatial Series.” In Theory and Practice in Regional Science, edited by I. Masser, 90–104. London Papers in Regional Science. London: Pion.

Lieshout, M. N. M. van. 2019. Theory of Spatial Statistics. Boca Raton, FL: Chapman & Hall/CRC.

McCulloch, Charles E., and Shayle R. Searle. 2001. Generalized, Linear, and Mixed Models. New York: Wiley.

Ord, J. Keith. 1975. “Estimation Methods for Models of Spatial Interaction.” Journal of the American Statistical Association 70: 120–26.

Pace, R. Kelley, and James P. LeSage. 2008. “A Spatial Hausman Test.” Economics Letters 101: 282–84. https://doi.org/10.1016/j.econlet.2008.09.003.

Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science with Applications in R. Boca Raton, FL: CRC Press.

Pinheiro, Jose C., and Douglas M. Bates. 2000. Mixed-Effects Models in S and S-Plus. New York: Springer.

Ripley, Brian D. 1981. Spatial Statistics. New York: John Wiley & Sons.

———. 1988. Statistical Inference for Spatial Processes. Cambridge: Cambridge University Press.