All the material presented here, to the extent it is original, is available under CC-BY-SA.
I am running R 4.1.0, with recent update.packages()
.
needed <- c("sphet", "coda", "spatialreg", "spData", "spdep", "sf")
Script and data at https://github.com/rsbivand/UAM21_II/raw/main/UAM21_II_210614.zip. Download to suitable location, unzip and use as basis.
Time | Topic |
---|---|
Monday 7/6 | |
09.00-12.00 | How may we integrate spatial data from different sources? How may we aggregate spatial data? Which data structures are helpful in handling spatial data? |
13.00-16.00 | Observations of spatial data are related to each other either by distance, or by a graph of edges linking observations seen as being neighbours. How may they be constructed? How may we address the issues raised by the probable presence of spatial autocorrelation in the spatial data that we are using? |
Tuesday 8/6 | |
09.00-12.00 | How can we measure global spatial autocorrelation? |
13.00-16.00 | How can we measure local spatial autocorrelation? |
Monday 14/6 | |
09.00-12.00 | How may we fit regression models to spatial data in the presence of spatial autocorrelation? Maximum likelihood GMM and Bayesian estimation, case weights. |
13.00-16.00 | How should we interpret the coefficients or impacts of spatial regression models? How may we predict from spatial regression models? |
Tuesday 15/6 | |
09.00-12.00 | Multi-level models: at which level in the data may we fit spatial processes? |
13.00-16.00 | Spatial filtering, hierarchical GLM, GAM, etc., spatial epidemiological approaches |
Monday 21/6 | |
09.00-12.00 | Presentations/consultations/discussion |
13.00-16.00 | Presentations/consultations/discussion |
Beyond software functionality, the existence of communities of interest becomes important
These extend the choices available, and bring in points of view across applied scientific domains and in modern computation, for example interworking across C++, Python and Julia with R
For example, the HSAR package [Dong and Harris (2015); dongetal:15] uses C++ extensively, rather than R sparse matrix packages
The model being introduced into INLA for non-MCMC Bayesian inference is another example of cooperation with other applied statisticians
The R packages spatialreg, sphet (Roger S. Bivand and Piras 2015) and for spatial panel models splm (Millo and Piras 2012) provide implementations of many model fitting functions for Gaussian dependent variables
These also use shared methods in spdep for computing impacts, and for MC inference on impacts
There are a range of fitting functions for non-Gaussian dependent variables in McSpatial, spatialprobit (Wilhelm and Matos 2013) and ProbitSpatial (Martinetti and Geniaux 2017), and these are also available using the ´slm´ model being introduced into INLA
Work is continuing on providing Bayesian inference functions, through MCMC and otherwise; the HSAR package is an example
Even though it may be tempting to focus on interpreting the map pattern of an area support response variable of interest, the pattern may largely derive from covariates (and their functional forms), as well as the respective spatial footprints of the variables in play. Spatial autoregressive models in two dimensions began without covariates and with clear links to time series (Whittle 1954). Extensions included tests for spatial autocorrelation in linear model residuals, and models applying the autoregressive component to the response or the residuals, where the latter matched the tests for residuals (A. Cliff and Ord 1972; A. D. Cliff and Ord 1973). These “lattice” models of areal data typically express the dependence between observations using a graph of neighbours in the form of a contiguity matrix.
A division has grown up, possibly unhelpfully, between scientific fields using conditional autoregressive (CAR) models (Besag 1974), and simultaneous autoregressive models (SAR) (Ord 1975; Hepple 1976). Although CAR and SAR models are closely related, these fields have found it difficult to share experience of applying similar models, often despite referring to key work summarising the models (Ripley 1981, 1988; Cressie 1993). Ripley gives the SAR variance as (1981, 89):
\[ {\rm Var}_S = \sigma^ 2(I-\lambda W_S)^{-1} (I-\lambda W_S^{\rm T})^{-1} \] where \(\lambda\) is a spatial autocorrelation parameter and \(W_S\) is a nonsingular matrix that represents spatial dependence. The CAR variance is:
\[ {\rm Var}_C = \sigma^ 2(I-\lambda W_C)^{-1} \] where and \(W_C\) is a symmetric and strictly positive definite matrix
More recent books expounding the theoretical bases for modelling with areal data simply point out the similarities in relevant chapters (Gaetan and Guyon 2010; Lieshout 2019); the interested reader is invited to consult these sources for background information and examples using the functions described below.
Of course, handling a spatial correlation structure in a generalised least squares model or a (generalised) linear or nonlinear mixed effects model such as those provided in the nlme and many other packages does not have to use a graph of neighbours (Pinheiro and Bates 2000). These models are also spatial regression models, using functions of the distance between observations, and fitted variograms to model the spatial autocorrelation present; such models have been held to yield a clearer picture of the underlying processes (Wall 2004), building on geostatistics. For example, the glmmTMB package successfully uses this approach to spatial regression (Brooks et al. 2017). Here we will only consider spatial regression using spatial weights, chiefly as implemented in the spatialreg package recently split out of the spdep package which had grown unnecessarily large, covering too many aspects of spatial dependence.
Recent development: (R. Bivand, Millo, and Piras 2021)
Spatial autoregression models using spatial weights matrices were described in some detail using maximum likelihood estimation some time ago (A. D. Cliff and Ord 1973, 1981). A family of models were elaborated in spatial econometric terms extending earlier work, and in many cases using the simultaneous autoregressive framework and row standardization of spatial weights (Anselin 1988). The simultaneous and conditional autoregressive frameworks can be compared, and both can be supplemented using case weights to reflect the relative importance of different observations (Waller and Gotway 2004).
Here we shall use the Boston housing data set, which has been restructured and furnished with census tract boundaries (R. Bivand 2017). The original data set used 506 census tracts and a hedonic model to try to estimate willingness to pay for clean air. The response was constructed from counts of ordinal answers to a 1970 census question about house value; the response is left and right censored in the census source. The key covariate was created from a calibrated meteorological model showing the annual nitrogen oxides (NOX) level for a smaller number of model output zones. The numbers of houses responding also varies by tract and model output zone. There are several other covariates, some measured at the tract level, some by town only, where towns broadly correspond to the air pollution model output zones.
library(sf)
## Linking to GEOS 3.10.0dev, GDAL 3.3.0, PROJ 8.0.1
library(spatialreg)
## Loading required package: spData
## Loading required package: Matrix
boston_506 <- st_read(system.file("shapes/boston_tracts.shp", package="spData")[1])
## Reading layer `boston_tracts' from data source
## `/home/rsb/lib/r_libs/spData/shapes/boston_tracts.shp' using driver `ESRI Shapefile'
## Simple feature collection with 506 features and 36 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -71.52311 ymin: 42.00305 xmax: -70.63823 ymax: 42.67307
## Geodetic CRS: NAD27
nb_q <- spdep::poly2nb(boston_506)
lw_q <- spdep::nb2listw(nb_q, style="W")
We can start by reading in the 506 tract data set from spData, and creating a contiguity neighbour object and from that again a row standardized spatial weights object. If we examine the median house values, we find that they have been assigned as missing values, and that 17 tracts are affected.
table(boston_506$censored)
##
## left no right
## 2 489 15
summary(boston_506$median)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 5600 16800 21000 21749 24700 50000 17
Next, we can subset to the remaining 489 tracts with non-censored house values, and the neighbour object to match. The neighbour object now has one observation with no neighbours.
boston_506$CHAS <- as.factor(boston_506$CHAS)
boston_489 <- boston_506[!is.na(boston_506$median),]
nb_q_489 <- spdep::poly2nb(boston_489)
lw_q_489 <- spdep::nb2listw(nb_q_489, style="W", zero.policy=TRUE)
The NOX_ID
variable specifies the upper level aggregation, letting us aggregate the tracts to air pollution model output zones. We can create aggregate neighbour and row standardized spatial weights objects, and aggregate the NOX
variable taking means, and the CHAS
Charles River dummy variable for observations on the river.
agg_96 <- list(as.character(boston_506$NOX_ID))
boston_96 <- aggregate(boston_506[, "NOX_ID"], by=agg_96, unique)
nb_q_96 <- spdep::poly2nb(boston_96)
lw_q_96 <- spdep::nb2listw(nb_q_96)
boston_96$NOX <- aggregate(boston_506$NOX, agg_96, mean)$x
boston_96$CHAS <- aggregate(as.integer(boston_506$CHAS)-1, agg_96, max)$x
The response is aggregated using the weightedMedian()
function in matrixStats, and midpoint values for the house value classes. Counts of houses by value class were punched to check the published census values, which can be replicated using weightedMedian()
at the tract level. Here we find two output zones with calculated weighted medians over the upper census question limit of USD 50,000, and remove them subsequently as they also are affected by not knowing the appropriate value to insert for the top class by value.
nms <- names(boston_506)
ccounts <- 23:31
for (nm in nms[c(22, ccounts, 36)]) {
boston_96[[nm]] <- aggregate(boston_506[[nm]], agg_96, sum)$x
}
br2 <- c(3.50, 6.25, 8.75, 12.50, 17.50, 22.50, 30.00, 42.50, 60.00)*1000
counts <- as.data.frame(boston_96)[, nms[ccounts]]
f <- function(x) matrixStats::weightedMedian(x=br2, w=x, interpolate=TRUE)
boston_96$median <- apply(counts, 1, f)
is.na(boston_96$median) <- boston_96$median > 50000
summary(boston_96$median)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 9009 20417 23523 25263 30073 49496 2
Before subsetting, we aggregate the remaining covariates by weighted mean using the tract population counts punched from the census (R. Bivand 2017).
boston_94 <- boston_96[!is.na(boston_96$median),]
nb_q_94 <- spdep::subset.nb(nb_q_96, !is.na(boston_96$median))
lw_q_94 <- spdep::nb2listw(nb_q_94, style="W")
We now have two data sets each at the lower, census tract level and the upper, air pollution model output zone level, one including the censored observations, the other excluding them.
The original model related the log of median house values by tract to the square of NOX values, including other covariates usually related to house value by tract, such as aggregate room counts, aggregate age, ethnicity, social status, distance to downtown and to the nearest radial road, a crime rate, and town-level variables reflecting land use (zoning, industry), taxation and education (R. Bivand 2017). This structure will be used here to exercise issues raised in fitting spatial regression models, including the presence of multiple levels.
form <- formula(log(median) ~ CRIM + ZN + INDUS + CHAS + I((NOX*10)^2) + I(RM^2) +
AGE + log(DIS) + log(RAD) + TAX + PTRATIO + I(BB/100) +
log(I(LSTAT/100)))
Before moving to presentations of issues raised in fitting spatial regression models, it is worth making a few further points. A recent review of spatial regression in a spatial econometrics setting is given by Kelejian and Piras (2017); note that their usage is to call the spatial coefficient of the lagged response \(\lambda\) and that of the lagged residuals \(\rho\), the reverse of other usage (Anselin 1988; James P. LeSage and Pace 2009); here we use \(\rho_{\mathrm{Lag}}\) for the spatial coefficient in the spatial lag model, and \(\rho_{\mathrm{Err}}\) for the spatial error model. One interesting finding is that relatively dense spatial weights matrices may downweight model estimates, suggesting that sparser weights are preferable (Tony E. Smith 2009). Another useful finding is that the presence of residual spatial autocorrelation need not bias the estimates of variance of regression coefficients, provided that the covariates themselves do not exhibit spatial autocorrelation (T. E. Smith and Lee 2012). In general, however, the footprints of the spatial processes of the response and covariates may not be aligned, and if covariates and the residual are autocorrelated, it is likely that the estimates of variance of regression coefficients will be biassed downwards if attempts are not made to model the spatial processes.
In trying to model these spatial processes, we may choose to model the spatial autocorrelation in the residual with a spatial error model (SEM).
\[ {\mathbf y} = {\mathbf X}{\mathbf \beta} + {\mathbf u}, \qquad {\mathbf u} = \rho_{\mathrm{Err}} {\mathbf W} {\mathbf u} + {\mathbf \varepsilon}, \] where \({\mathbf y}\) is an \((N \times 1)\) vector of observations on a response variable taken at each of \(N\) locations, \({\mathbf X}\) is an \((N \times k)\) matrix of covariates, \({\mathbf \beta}\) is a \((k \times 1)\) vector of parameters, \({\mathbf u}\) is an \((N \times 1)\) spatially autocorrelated disturbance vector, \({\mathbf \varepsilon}\) is an \((N \times 1)\) vector of independent and identically distributed disturbances and \(\rho_{\mathrm{Err}}\) is a scalar spatial parameter.
If the processes in the covariates and the response match, we should find little difference between the coefficients of a least squares and a SEM, but very often they diverge, suggesting that a Hausman test for this condition should be employed (R. Pace and LeSage 2008). This may be related to earlier discussions of a spatial equivalent to the unit root and cointegration where spatial processes match (Fingleton 1999).
A model with a spatial process in the response only is termed a spatial lag model (SLM, often SAR - spatial autoregressive) (James P. LeSage and Pace 2009).
\[ {\mathbf y} = \rho_{\mathrm{Lag}} {\mathbf W}{\mathbf y} + {\mathbf X}{\mathbf \beta} + {\mathbf \varepsilon}, \] where \(\rho_{\mathrm{Lag}}\) is a scalar spatial parameter.
Work reviewed by Mur and Angulo (2006) on the Durbin model; the Durbin model adds the spatially lagged covariates to the covariates included in the spatial lag model, giving a spatial Durbin model (SDM) with different processes in the response and covariates:
\[ {\mathbf y} = \rho_{\mathrm{Lag}} {\mathbf W}{\mathbf y} + {\mathbf X}{\mathbf \beta} + {\mathbf W}{\mathbf X}{\mathbf \gamma} + {\mathbf \varepsilon}, \] where \({\mathbf \gamma}\) is a \((k' \times 1)\) vector of parameters. \(k'\) defines the subset of the intercept and covariates, often \(k' = k-1\) when using row standardised spatial weights and omitting the spatially lagged intercept.
This permits the spatial processes to be viewed and tested for as a Common Factor (Burridge 1981; R. S. Bivand 1984). The inclusion of spatially lagged covariates lets us check whether the same spatial process is manifest in the response and the covariates (SEM), whether they are different processes, or whether no process is detected. The Common Factor is present when \({\mathbf \gamma} = - \rho_{\mathrm{Lag}} {\mathbf \beta}\):
\[ {\mathbf y} = \rho_{\mathrm{Lag}} {\mathbf W}{\mathbf y} + {\mathbf X}{\mathbf \beta} - \rho_{\mathrm{Lag}} {\mathbf W}{\mathbf X} {\mathbf \beta} + {\mathbf \varepsilon}, \qquad ({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W}){\mathbf y} = ({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W}) {\mathbf X}{\mathbf \beta} + {\mathbf \varepsilon}, \] where \({\mathbf I}\) is the \(N \times N\) identity matrix, and \({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W}\) is the Common Factor:
\[ \qquad {\mathbf y} = {\mathbf X}{\mathbf \beta} + ({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W})^{-1} {\mathbf \varepsilon}, \qquad {\mathbf y} = {\mathbf X}{\mathbf \beta} + {\mathbf u}, \qquad {\mathbf u} = \rho_{\mathrm{Err}} {\mathbf W} {\mathbf u} + {\mathbf \varepsilon}, \]
If we extend this family with processes in the covariates and the residual, we get a spatial error Durbin model (SDEM). If it is chosen to admit a spatial process in the residuals in addition to a spatial process in the response, again two models are formed, a general nested model (GNM) nesting all the others, and a model without spatially lagged covariates (SAC, also known as SARAR - Spatial AutoRegressive-AutoRegressive model). If neither the residuals nor the residual are modelled with spatial processes, spatially lagged covariates may be added to a linear model, as a spatially lagged X model (SLX) (Elhorst 2010; Roger S. Bivand 2012; J. P. LeSage 2014; Halleck Vega and Elhorst 2015).
We can write the general nested model (GNM) as:
\[ {\mathbf y} = \rho_{\mathrm{Lag}} {\mathbf W}{\mathbf y} + {\mathbf X}{\mathbf \beta} + {\mathbf W}{\mathbf X}{\mathbf \gamma} + {\mathbf u}, \qquad {\mathbf u} = \rho_{\mathrm{Err}} {\mathbf W} {\mathbf u} + {\mathbf \varepsilon}, \]
This may be constrained to the double spatial coefficient model SAC/SARAR by setting \({\mathbf \gamma} = 0\), to the spatial Durbin (SDM) by setting \(\rho_{\mathrm{Err}} = 0\), and to the error Durbin model (SDEM) by setting \(\rho_{\mathrm{Lag}} = 0\). Imposing more conditions gives the spatial lag model (SLM) with \({\mathbf \gamma} = 0\) and \(\rho_{\mathrm{Err}} = 0\), the spatial error model (SEM) with \({\mathbf \gamma} = 0\) and \(\rho_{\mathrm{Lag}} = 0\), and the spatially lagged X model (SLX) with \(\rho_{\mathrm{Lag}} = 0\) and \(\rho_{\mathrm{Err}} = 0\).
Although making predictions for new locations for which covariates are observed was raised as an issue some time ago, it has many years to make progress in reviewing the possibilities (R. S. Bivand 2002; Goulard, Laurent, and Thomas-Agnan 2017). The prediction methods for SLM, SDM, SEM, SDEM, SAC and GNM models fitted with maximum likelihood were contributed as a Google Summer of Coding project by Martin Gubri. This work, and work on similar models with missing data (Suesse 2018) is also relevant for exploring censored median house values in the Boston data set. Work on prediction also exposed the importance of the reduced form of these models, in which the spatial process in the response interacts with the regression coefficients in the SLM, SDM, SAC and GNM models.
The consequence of these interactions is that a unit change in a covariate will only impact the response as the value of the regression coefficient if the spatial coefficient of the lagged response is zero. Where it is non-zero, global spillovers, impacts, come into play, and these impacts should be reported rather than the regression coefficients (James P. LeSage and Pace 2009; Elhorst 2010; Roger S. Bivand 2012; J. P. LeSage 2014; Halleck Vega and Elhorst 2015). Local impacts may be reported for SDEM and SLX models, using linear combination to calculate standard errors for the total impacts of each covariate (sums of coefficients on the covariates and their spatial lags).
This can be seen from the GNM data generation process:
\[ ({\mathbf I} - \rho_{\mathrm{Err}} {\mathbf W})({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W}){\mathbf y} = ({\mathbf I} - \rho_{\mathrm{Err}} {\mathbf W})({\mathbf X}{\mathbf \beta} + {\mathbf W}{\mathbf X}{\mathbf \gamma}) + {\mathbf \varepsilon}, \] re-writing:
\[ {\mathbf y} = ({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W})^{-1}({\mathbf X}{\mathbf \beta} + {\mathbf W}{\mathbf X}{\mathbf \gamma}) + ({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W})^{-1}({\mathbf I} - \rho_{\mathrm{Err}} {\mathbf W})^{-1}{\mathbf \varepsilon}. \] There is interaction between the \(\rho_{\mathrm{Lag}}\) and \({\mathbf \beta}\) (and \({\mathbf \gamma}\) if present) coefficients. This can be seen from the partial derivatives: \(\partial y_i / \partial x_{jr} = (({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W})^{-1} ({\mathbf I} \beta_r + {\mathbf W} \gamma_r))_{ij}\). This dense matrix \(S_r({\mathbf W}) = (({\mathbf I} - \rho_{\mathrm{Lag}} {\mathbf W})^{-1} ({\mathbf I} \beta_r + {\mathbf W} \gamma_r))\) expresses the direct impacts (effects) on its principal diagonal, and indirect impacts in off-diagonal elements.
Current work in the spatialreg package is focused on refining the handling of spatially lagged covariates using a consistent Durbin=
argument taking either a logical value or a formula giving the subset of covariates to add in spatially lagged form. There is a speculation that some covariates, for example some dummy variables, should not be added in spatially lagged form. This then extends to handling these included spatially lagged covariates appropriately in calculating impacts. This work applies to cross-sectional models fitted using MCMC or maximum likelihood, and will offer facilities to spatial panel models.
It is worth mentioning the almost unexplored issues of functional form assumptions, for which flexible structures are useful, including spatial quantile regression presented in the McSpatial package (McMillen 2013). There are further issues with discrete response variables, covered by some functions in McSpatial, and in the spatialprobit and ProbitSpatial packages (Wilhelm and Matos 2013; Martinetti and Geniaux 2017); the MCMC implementations of the former are based on LeSage and Pace (2009). Finally, Wagner and Zeileis (2019) show how an SLM model may be used in the setting of recursive partitioning, with an implementation using spatialreg::lagsarlm()
in the lagsarlmtree package.
The review of cross-sectional maximum likelihood and generalized method of moments (GMM) estimators in spatialreg and sphet for spatial econometrics style spatial regression models by Bivand and Piras (2015) is still largely valid. In the review, estimators in these R packages were compared with alternative implementations available in other programming languages elsewhere. The review did not cover Bayesian spatial econometrics style spatial regression. More has changed with respect to spatial panel estimators described in Millo and Piras (2012), but will not be covered here.
For models with single spatial coefficients (SEM and SDEM using errorsarlm()
, SLM and SDM using lagsarlm()
), the methods initially described by Ord (1975) are used. The following table shows the functions that can be used to estimate the models described above using maximum likelihood.
model | model name | maximum likelihood estimation function |
---|---|---|
SEM | spatial error | errorsarlm(..., Durbin=FALSE, ...) |
SEM | spatial error | spautolm(..., family="SAR", ...) |
SDEM | spatial Durbin error | errorsarlm(..., Durbin=TRUE, ...) |
SLM | spatial lag | lagsarlm(..., Durbin=FALSE, ...) |
SDM | spatial Durbin | lagsarlm(..., Durbin=TRUE, ...) |
SAC | spatial autoregressive combined | sacsarlm(..., Durbin=FALSE, ...) |
GNM | general nested | sacsarlm(..., Durbin=TRUE, ...) |
The estimating functions errorsarlm()
and lagsarlm()
take similar arguments, where the first two, formula=
and data=
are shared by most model estimating functions. The third argument is a listw
spatial weights object, while na.action=
behaves as in other model estimating functions if the spatial weights can reasonably be subsetted to avoid observations with missing values. The weights=
argument may be used to provide weights indicating the known degree of per-observation variability in the variance term - this is not available for lagsarlm()
.
args(errorsarlm)
## function (formula, data = list(), listw, na.action, weights = NULL,
## Durbin, etype, method = "eigen", quiet = NULL, zero.policy = NULL,
## interval = NULL, tol.solve = .Machine$double.eps, trs = NULL,
## control = list())
## NULL
args(lagsarlm)
## function (formula, data = list(), listw, na.action, Durbin, type,
## method = "eigen", quiet = NULL, zero.policy = NULL, interval = NULL,
## tol.solve = .Machine$double.eps, trs = NULL, control = list())
## NULL
The Durbin=
argument replaces the earlier type=
and etype=
arguments, and if not given is taken as FALSE
. If given, it may be FALSE
, TRUE
in which case all spatially lagged covariates are included, or a one-sided formula specifying which spatially lagged covariates should be included. The method=
argument gives the method for calculating the log determinant term in the log likelihood function, and defaults to "eigen"
, suitable for moderately sized data sets. The interval=
argument gives the bounds of the domain for the line search using stats::optimize()
used for finding the spatial coefficient. The tol.solve()
argument, passed through to base::solve()
, was needed to handle data sets with differing numerical scales among the coefficients which hindered inversion of the variance-covariance matrix; the default value in base::solve()
used to be much larger. The control=
argument takes a list of control values to permit more careful adjustment of the running of the estimation function.
The spautolm()
function also fits spatial regressions with the spatial process in the residuals, and takes a possibly poorly named family=
argument, taking the values of "SAR"
for the simultaneous autoregressive model (like errorsarlm()
), "CAR"
for the conditional autoregressive model, and "SMA"
for the spatial moving average model.
args(spautolm)
## function (formula, data = list(), listw, weights, na.action,
## family = "SAR", method = "eigen", verbose = NULL, trs = NULL,
## interval = NULL, zero.policy = NULL, tol.solve = .Machine$double.eps,
## llprof = NULL, control = list())
## NULL
The sacsarlm()
function may take second spatial weights and interval arguments if the spatial weights used to model the two spatial processes in the SAC and GNM specifications differ. By default, the same spatial weights are used. By default, stats::nlminb()
is used for numerical optimization, using a heuristic to choose starting values.
args(sacsarlm)
## function (formula, data = list(), listw, listw2 = NULL, na.action,
## Durbin, type, method = "eigen", quiet = NULL, zero.policy = NULL,
## tol.solve = .Machine$double.eps, llprof = NULL, interval1 = NULL,
## interval2 = NULL, trs1 = NULL, trs2 = NULL, control = list())
## NULL
Where larger data sets are used, a numerical Hessian approach is used to calculate the variance-covariance matrix of coefficients, rather than an analytical asymptotic approach.
Apart from spautolm()
which returns an "spautolm"
object, the model fitting functions return "Sarlm"
objects. Standard methods for fitted models are provided, such as summary()
:
args(getS3method("summary", "Sarlm"))
## function (object, correlation = FALSE, Nagelkerke = FALSE, Hausman = FALSE,
## adj.se = FALSE, ...)
## NULL
The Nagelkerke=
argument permits the return of a value approximately corresponding to a coefficient of determination, although the summary method anyway provides the value of stats::AIC()
because a stats::logLik()
method is provided for "sarlm"
and "spautolm"
objects. If the "sarlm"
object is a SEM or SDEM, the Hausman test may be performed by setting Hausman=TRUE
to see whether the regression coefficients are sufficiently like least squares coefficients, indicating absence of mis-specification from that source. As an example, we may fit SEM and SDEM to the 94 and 489 observation Boston data sets, and present the Hausman test results:
eigs_489 <- eigenw(lw_q_489)
SDEM_489 <- errorsarlm(form, data=boston_489, listw=lw_q_489, Durbin=TRUE,
zero.policy=TRUE, control=list(pre_eig=eigs_489))
SEM_489 <- errorsarlm(form, data=boston_489, listw=lw_q_489,
zero.policy=TRUE, control=list(pre_eig=eigs_489))
cbind(data.frame(model=c("SEM", "SDEM")),
rbind(broom::tidy(Hausman.test(SEM_489)),
broom::tidy(Hausman.test(SDEM_489))))[,1:4]
## model statistic p.value parameter
## 1 SEM 51.9862 2.827192e-06 14
## 2 SDEM 48.6551 6.481654e-03 27
Here we are using the control=
list argument to pass through pre-computed eigenvalues for the default "eigen"
method. Both test results for the 489 tract data set suggest that the regression coefficients do differ, perhaps that the footprints of the spatial processes do not match. Likelihood ratio tests of the spatial models against their least squares equivalents show that spatial process(es) are present, but we find that neither SEM not SDM are adequate representations.
cbind(data.frame(model=c("SEM", "SDEM")),
rbind(broom::tidy(LR1.Sarlm(SEM_489)),
broom::tidy(LR1.Sarlm(SDEM_489))))[,c(1, 4:6)]
## model statistic p.value parameter
## 1 SEM 198.4133 0 1
## 2 SDEM 159.3798 0 1
For the 94 air pollution model output zones, the Hausman tests find little difference between coefficients, but this is related to the fact that the SEM and SDEM models add little to least squares or SLX using likelihood ratio tests.
eigs_94 <- eigenw(lw_q_94)
SDEM_94 <- errorsarlm(form, data=boston_94, listw=lw_q_94, Durbin=TRUE,
control=list(pre_eig=eigs_94))
SEM_94 <- errorsarlm(form, data=boston_94, listw=lw_q_94, control=list(pre_eig=eigs_94))
cbind(data.frame(model=c("SEM", "SDEM")),
rbind(broom::tidy(Hausman.test(SEM_94)),
broom::tidy(Hausman.test(SDEM_94))))[, 1:4]
## model statistic p.value parameter
## 1 SEM 15.657083 0.3347612 14
## 2 SDEM 9.205152 0.9994394 27
cbind(data.frame(model=c("SEM", "SDEM")), rbind(broom::tidy(LR1.Sarlm(SEM_94)), broom::tidy(LR1.Sarlm(SDEM_94))))[,c(1, 4:6)]
## model statistic p.value parameter
## 1 SEM 2.5933581 0.1073126 1
## 2 SDEM 0.2158487 0.6422213 1
We can use spatialreg::LR.sarlm()
to apply a likelihood ratio test between nested models, but here choose lmtest::lrtest()
, which gives the same results, preferring models including spatially lagged covariates.
broom::tidy(lmtest::lrtest(SEM_489, SDEM_489))
## Warning in tidy.anova(lmtest::lrtest(SEM_489, SDEM_489)): The following column
## names in ANOVA output were not recognized or transformed: X.Df, LogLik
## # A tibble: 2 x 5
## X.Df LogLik df statistic p.value
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 16 273. NA NA NA
## 2 29 311. 13 74.4 1.23e-10
broom::tidy(lmtest::lrtest(SEM_94, SDEM_94))
## Warning in tidy.anova(lmtest::lrtest(SEM_94, SDEM_94)): The following column
## names in ANOVA output were not recognized or transformed: X.Df, LogLik
## # A tibble: 2 x 5
## X.Df LogLik df statistic p.value
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 16 59.7 NA NA NA
## 2 29 81.3 13 43.2 0.0000421
The SLX model is fitted using least squares, and also returns a log likelihood value, letting us test whether we need a spatial process in the residuals.
args(lmSLX)
## function (formula, data = list(), listw, na.action, weights = NULL,
## Durbin = TRUE, zero.policy = NULL)
## NULL
In the tract data set we obviously do:
SLX_489 <- lmSLX(form, data=boston_489, listw=lw_q_489, zero.policy=TRUE)
broom::tidy(lmtest::lrtest(SLX_489, SDEM_489))
## Warning in modelUpdate(objects[[i - 1]], objects[[i]]): original model was of
## class "SlX", updated model is of class "Sarlm"
## Warning in tidy.anova(lmtest::lrtest(SLX_489, SDEM_489)): The following column
## names in ANOVA output were not recognized or transformed: X.Df, LogLik
## # A tibble: 2 x 5
## X.Df LogLik df statistic p.value
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 28 231. NA NA NA
## 2 29 311. 1 159. 1.55e-36
but in the output zone case we do not.
SLX_94 <- lmSLX(form, data=boston_94, listw=lw_q_94)
broom::tidy(lmtest::lrtest(SLX_94, SDEM_94))
## Warning in modelUpdate(objects[[i - 1]], objects[[i]]): original model was of
## class "SlX", updated model is of class "Sarlm"
## Warning in tidy.anova(lmtest::lrtest(SLX_94, SDEM_94)): The following column
## names in ANOVA output were not recognized or transformed: X.Df, LogLik
## # A tibble: 2 x 5
## X.Df LogLik df statistic p.value
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 28 81.2 NA NA NA
## 2 29 81.3 1 0.216 0.642
This outcome is sustained also when we use the counts of house units by tract and output zones as weights:
SLX_94w <- lmSLX(form, data=boston_94, listw=lw_q_94, weights=units)
SDEM_94w <- errorsarlm(form, data=boston_94, listw=lw_q_94, Durbin=TRUE, weights=units,
control=list(pre_eig=eigs_94))
broom::tidy(lmtest::lrtest(SLX_94w, SDEM_94w))
## Warning in modelUpdate(objects[[i - 1]], objects[[i]]): original model was of
## class "SlX", updated model is of class "Sarlm"
## Warning in tidy.anova(lmtest::lrtest(SLX_94w, SDEM_94w)): The following column
## names in ANOVA output were not recognized or transformed: X.Df, LogLik
## # A tibble: 2 x 5
## X.Df LogLik df statistic p.value
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 28 97.5 NA NA NA
## 2 29 98.0 1 0.917 0.338
The estimation methods used for fitting SLM and the spatially lagged response part of SARAR models are based on instrumental variables, using the spatially lagged covariates (usually including lags of lags too) as instruments for the spatially lagged response (Piras 2010). This, and the use of spatially lagged covariates in the moment conditions for models including a spatial process in the residuals, means that including the spatially lagged covariates in the initial model is challenging, and functions in the sphet package do not provide a Durbin=
argument. This makes it harder to accommodate data with multiple spatial process footprints. However, as Kelejian and Piras show in their recent review (2017), these approaches have other benefits, such as being able to instrument variables suspected of suffering from endogeneity or measurement error. Let us first compare the ML and GMM estimates of the SEM regression coefficients for rescaled squared NOX values. We use the sphet::spreg()
wrapper function, and fit a SEM model. Extracting the matching rows from the summary objects for these models, we can see that the values are not dissimilar, despite the difference in estimation methods.
SEM1_94 <- sphet::spreg(form, data=boston_94, listw=lw_q_94, model="error")
res <- rbind(summary(SEM_94)$Coef["I((NOX * 10)^2)",],
summary(SEM1_94)$CoefTable["I((NOX * 10)^2)",])
rownames(res) <- c("ML", "GMM")
res
## Estimate Std. Error z value Pr(>|z|)
## ML -0.009564967 0.0026099 -3.664879 0.0002474555
## GMM -0.009744770 0.0026113 -3.731770 0.0001901391
Using sphet::spreg()
, we can instrument the rescaled squared NOX variable, dropping it first from the formula, next creating the rescaled squared NOX variable as a column in the "sf"
object, extracting a matrix of coordinates from the centroids of the output zones, and creating a one-sided instrument formula from a second order polynomial in the coordinates (here improperly, as they are not projected) and mean distances to downtown and radial roads. The endog=
argument takes a one-sided formula for the variables to be modelled in the first stage of the model. Had we re-run the original air pollution model many times under slightly varying scenarios, we could have used an ensemble of NOX loadings to yield its distribution by output zone. Because this is not possible, we assume that the measurement error can be captured by using selected instruments. Unfortunately, the NOX regression coefficient estimate from the second stage has fallen substantially in absolute size, although the sign is unchanged.
formiv <- update(form, . ~ . - I((NOX*10)^2))
boston_94$NOX2 <- (boston_94$NOX*10)^2
suppressWarnings(ccoords <- st_coordinates(st_centroid(st_geometry(boston_94))))
iform <- formula(~poly(ccoords, degree=2) + DIS + RAD)
SEM1_94iv <- sphet::spreg(formiv, data=boston_94, listw=lw_q_94, endog = ~NOX2,
instruments=iform, model="error")
summary(SEM1_94iv)$CoefTable["NOX2",]
## Estimate Std. Error t-value Pr(>|t|)
## -0.001972107 0.005664617 -0.348144820 0.727731426
Handling measurement error in this or similar ways is one of the benefits of GMM estimation methods, although here the choice of instruments was somewhat arbitrary.
The Spatial Econometrics Library is part of the extensive Matlab code repository at https://www.spatial-econometrics.com/ and documented in LeSage and Pace (2009). The Google Summer of Coding project in 2011 by Abhirup Mallik mentored by Virgilio Gómez-Rubio yielded translations of some of the model fitting functions for SEM, SDEM, SLM, SDM, SAC and GNM from the Matlab code. These have now been added to spatialreg as spBreg_err()
, spBreg_lag()
and spBreg_sac()
with Durbin=
arguments to handle the inclusion of spatially lagged covariates. As yet, heteroskedastic disturbances are not accommodated. The functions return "mcmc"
objects as specified in the coda package, permitting the use of tools from coda for handling model output. The default burnin is 500 draws, followed by default 2000 draws returned, but these values and many others may be set through the control=
list argument. Fitting the SDEM model for the output zones takes about an order of magnitude longer than using ML, but there is more work to do subsequently, and this difference scales more in the number of samples than covariates or observations.
system.time(SDEM_94B <- spBreg_err(form, data=boston_94, listw=lw_q_94, Durbin=TRUE))
## user system elapsed
## 1.750 0.010 1.767
system.time(SDEM_489B <- spBreg_err(form, data=boston_489, listw=lw_q_489, Durbin=TRUE, zero.policy=TRUE))
## user system elapsed
## 2.467 0.001 2.477
Most time in the ML case using eigenvalues is taken by log determinant setup and optimization, and by dense matrix asymptotic standard errors if chosen (always chosen with default "eigen"
log determinant method):
t(errorsarlm(form, data=boston_94, listw=lw_q_94, Durbin=TRUE)$timings[,2])
## set_up eigen_set_up eigen_opt coefs eigen_hcov eigen_se
## [1,] 0.004 0.004 0.007 0.003 0.002 0.001
t(errorsarlm(form, data=boston_489, listw=lw_q_489, Durbin=TRUE, zero.policy=TRUE)$timings[,2])
## set_up eigen_set_up eigen_opt coefs eigen_hcov eigen_se
## [1,] 0.007 0.052 0.025 0.006 0.096 0.187
while in the MCMC case, the default use of Spatial Econometric Toolbox gridded log determinants and obviously sampling takes most time:
t(attr(SDEM_94B, "timings")[ , 3])
## set_up SE_classic_set_up complete_setup sampling finalise
## [1,] 0.003 0.25 0.03 1.427 0.056
t(attr(SDEM_489B, "timings")[ , 3])
## set_up SE_classic_set_up complete_setup sampling finalise
## [1,] 0.005 0.414 0.07 1.935 0.053
However, as we will see shortly, inference from model impacts may need sampling (where the spatially lagged response is included in the model), and in the case of MCMC models, the samples have already been drawn.
We will briefly touch on selected implementation details that applied users of spatial regression models would be wise to review. The handling of the log determinant term applies to all such users, while impacts are restricted to those employing spatial econometrics style models either including the spatially lagged response or including spatially lagged covariates.
It has been known for over twenty years that the sparse matrix representation of spatial weights overcomes the difficulties of fitting models with larger numbers of observations using maximum likelihood and MCMC where the log determinant term comes into play (R. K. Pace and Barry 1997a, 1997c, 1997d, 1997b). During the development of these approaches in model fitting functions in spatialreg, use was first made of C code also used in the S-PLUS SpatialStats module (Kaluzny et al. 1998), then SparseM which used a compressed sparse row form very similar to "nb"
and "listw"
objects. This was followed by the use of spam and Matrix methods, both of which mainly use compressed sparse column representations. Details are provided in Bivand, Hauke and Kossowski (2013).
The domain of the spatial coefficient(s) is given by the interval=
argument to model fitting functions, and returned in the fitted object:
SEM_94$interval
## [1] -1.527257 1.000000
This case is trivial, because the upper bound is unity by definition, because of the use of row standardization. The interval is the inverse of the range of the eigenvalues of the weights matrix:
1/range(eigs_94)
## [1] -1.527257 1.000000
Finding the interval within which to search for the spatial coefficient is trivial for smaller data sets, but more complex for larger ones. It is possible to use heuristics implemented in lextrW()
(Griffith, Bivand, and Chun 2015):
1/c(lextrW(lw_q_94))
## lambda_n lambda_1
## -1.527257 1.000000
or RSpectra::eigs()
after coercion to a Matrix package compressed sparse column representation:
W <- as(lw_q_94, "CsparseMatrix")
1/Re(c(RSpectra::eigs(W, k=1, which="SR")$values,
RSpectra::eigs(W, k=1, which="LR")$values))
## [1] -1.527257 1.000000
Why are we extracting the real part of the values returned by eigs()
? Since nb_q_94
is symmetric, the row-standardized object lw_q_94
is symmetric by similarity, a result known since Ord (1975); consequently, we can take the real part without concern. Had the underlying neighbour relationships not been symmetric, we should be more careful.
The baseline log determinant term as given by Ord (1975) for a coefficient value proposed in sampling or during numerical optimization; this extract matches the "eigen"
method (with or without control=list(pre_eig=...)"
):
coef <- 0.5
sum(log(1 - coef * eigs_94))
## [1] -2.867292
Using sparse matrix functions from Matrix, the LU decomposition can be used for asymmetric matrices; this extract matches the "LU"
method:
I <- Diagonal(nrow(boston_94))
LU <- lu(I - coef * W)
dU <- abs(diag(slot(LU, "U")))
sum(log(dU))
## [1] -2.867292
and Cholesky decomposition for symmetric matrices, with similar.listw()
used to handle asymmetric weights that are similar to symmetric. The default value od super
allows the underlying Matrix code to choose between supernodal or simplicial decomposition; this extract matches the "Matrix_J"
method:
W <- as(similar.listw(lw_q_94), "CsparseMatrix")
super <- as.logical(NA)
cch <- Cholesky((I - coef * W), super=super)
c(2 * determinant(cch, logarithm = TRUE)$modulus)
## [1] -2.867292
The "Matrix"
and "spam_update"
methods are to be preferred as they pre-compute the fill-reducing permutation of the decomposition since the weights do not change for different values of the coefficient.
Maximum likelihood model fitting functions in spatialreg and splm use jacobianSetup()
to populate env=
environment with intermediate objects needed to find log determinants during optimization. Passing environments to objective functions is efficient because they are passed by reference rather than value. The con=
argument passes through the populated control list, containing default values unless the key-value pairs were given in the function call (pre_eig=
is extracted separately). The which=
argument is 1
by default, but will also take 2
in SAC and GNM models. HSAR uses mcdet_setup()
to set up Monte Carlo approximation terms.
args(jacobianSetup)
## function (method, env, con, pre_eig = NULL, trs = NULL, interval = NULL,
## which = 1)
## NULL
For each value of coef
, the do_ldet()
function returns the log determinant, using the values stored in environment env=
:
args(do_ldet)
## function (coef, env, which = 1)
## NULL
As yet the Bayesian models are limited to control argument ldet_method="SE_classic"
at present, using "LU"
to generate a coarse grid of control argument nrho=200L
log determinant values in the interval, spline interpolated to a finer grid of length control argument interpn=2000L
, from which griddy Gibbs samples are drawn. It is hoped to add facilities to choose alternative methods in the future. This would offer possibilities to move beyond griddy Gibbs, but using gridded log determinant values seems reasonable at present.
Global impacts have been seen as crucial for reporting results from fitting models including the spatially lagged response (SLM, SDM, SAC. GNM) for over ten years (James P. LeSage and Pace 2009). Extension to other models including spatially lagged covariates (SLX, SDEM) has followed (Elhorst 2010; Roger S. Bivand 2012; Halleck Vega and Elhorst 2015). In order to infer from the impacts, linear combination may be used for SLX and SDEM models. For SLM, SDM, SAC and GNM models fitted with maximum likelihood or GMM, the variance-covariance matrix of the coefficients is available, and can be used to make random draws from a multivariate Normal distribution with mean set to coefficient values and variance to the estimated variance-covariance matrix. For these models fitted using Bayesian methods, draws are already available. In the SDEM case, the draws on the regression coefficients of the unlagged covariates represent direct impacts, and draws on the coefficients of the spatially lagged covariates represent indirect impacts, and their by-draw sums the total impacts.
Impacts are calculated using model object class specific impacts()
methods, here taking the method for "sarlm"
objects as an example. In the sphet package, the impacts method for "gstsls"
uses the spatialreg impacts()
framework, as does the splm package for "splm"
fitted model objects. impacts()
methods require either a tr=
- a vector of traces of the power series of the weights object typically computed with trW()
or a listw=
argument. If listw=
is given, dense matrix methods are used. The evalues=
argument is experimental, does not yet work for all model types, and takes the eigenvalues of the weights matrix. The R=
argument gives the number of samples to be taken from the fitted model. The Q=
permits the decomposition of impacts to components of the power series of the weights matrix (James P. LeSage and Pace 2009).
args(getS3method("impacts", "Sarlm"))
## function (obj, ..., tr = NULL, R = NULL, listw = NULL, evalues = NULL,
## useHESS = NULL, tol = 1e-06, empirical = FALSE, Q = NULL)
## NULL
The summary method for the output of impacts()
methods where inference from samples was requested by default uses the summary()
method for "mcmc"
objects defined in the coda package. It can instead report just matrices of standard errors, z-values and p-values by setting zstats=
and short=
to TRUE
.
args(getS3method("summary", "LagImpact"))
## function (object, ..., zstats = FALSE, short = FALSE, reportQ = NULL)
## NULL
Since sampling is not required for inference for SLX and SDEM models, linear combination is used for models fitted using maximum likelihood; results are shown here for the air pollution variable only. The literature has not yet resolved the question of how to report model output, as each covariate is now represented by three impacts. Where spatially lagged covariates are included, two coefficients are replaced by three impacts.
sum_imp_94_SDEM <- summary(impacts(SDEM_94))
rbind(Impacts=sum_imp_94_SDEM$mat[5,], SE=sum_imp_94_SDEM$semat[5,])
## Direct Indirect Total
## Impacts -0.012764045 -0.018454532 -0.031218577
## SE 0.002354762 0.004717734 0.005303962
The impacts from the same model fitted by MCMC are very similar:
sum_imp_94_SDEM_B <- summary(impacts(SDEM_94B))
rbind(Impacts=sum_imp_94_SDEM_B$mat[5,], SE=sum_imp_94_SDEM_B$semat[5,])
## Direct Indirect Total
## Impacts -0.012653812 -0.01740874 -0.030062550
## SE 0.002485255 0.00575439 0.006774467
as also are those from the SLX model. In the SLX and SDEM models, the direct impacts are the consequences for the response of changes in air pollution in the same observational entity, and the indirect (local) impacts are the consequences for the response of changes in air pollution in neighbouring observational entities.
sum_imp_94_SLX <- summary(impacts(SLX_94))
rbind(Impacts=sum_imp_94_SLX$mat[5,], SE=sum_imp_94_SLX$semat[5,])
## Direct Indirect Total
## Impacts -0.012766473 -0.018743765 -0.031510238
## SE 0.002796966 0.005564633 0.006114058
In contrast to local indirect impacts in SLX and SDEM models, global indirect impacts are found in models including the spatially lagged response. For purposes of exposition, let us fit an SLM:
SLM_489 <- lagsarlm(form, data=boston_489, listw=lw_q_489, zero.policy=TRUE)
Traces of the first m=
matrices of the power series in the spatial weights are pre-computed to speed up inference from samples from the fitted model, and from existing MCMC samples (James P. LeSage and Pace 2009). The traces can also be used in other contexts too, so their pre-computation may be worthwhile anyway. The type=
argument is "mult"
by default, but may also be set to "MC"
for Monte Carlo simulation or "moments"
using a space-saving looping algorithm.
args(trW)
## function (W = NULL, m = 30, p = 16, type = "mult", listw = NULL,
## momentsSymmetry = TRUE)
## NULL
W <- as(lw_q_489, "CsparseMatrix")
tr_489 <- trW(W)
str(tr_489)
## num [1:30] 0 90.8 29.4 42.1 26.5 ...
## - attr(*, "timings")= Named num [1:2] 0.164 0.169
## ..- attr(*, "names")= chr [1:2] "user.self" "elapsed"
## - attr(*, "type")= chr "mult"
## - attr(*, "n")= int 489
In this case, the spatial process in the response is not strong, so the global indirect impacts (here for the air pollution variable) are weak.
SLM_489_imp <- impacts(SLM_489, tr=tr_489, R=2000)
SLM_489_imp_sum <- summary(SLM_489_imp, short=TRUE, zstats=TRUE)
res <- rbind(Impacts=sapply(SLM_489_imp$res, "[", 5), SE=SLM_489_imp_sum$semat[5,])
colnames(res) <- c("Direct", "Indirect", "Total")
res
## Direct Indirect Total
## Impacts -0.005930731 -1.014744e-05 -0.005940879
## SE 0.001073926 1.013427e-04 0.001082105
Of more interest is trying to reconstruct the direct and total impacts using dense matrix methods; the direct global impacts are the mean of the diagonal of the dense impacts matrix, and the total global impacts are the sum of all matrix elements divided by the number of observations. The direct impacts agree, but the total impacts differ slightly.
coef_SLM_489 <- coef(SLM_489)
IrW <- Diagonal(489) - coef_SLM_489[1] * W
S_W <- solve(IrW)
S_NOX_W <- S_W %*% (diag(489) * coef_SLM_489[7])
c(Direct=mean(diag(S_NOX_W)), Total=sum(S_NOX_W)/489)
## Direct Total
## -0.005930731 -0.005940858
This bare-bones approach corresponds to using the listw=
argument, and as expected gives the same output.
sapply(impacts(SLM_489, listw=lw_q_489), "[", 5)
## direct indirect total
## -5.930731e-03 -1.014744e-05 -5.940879e-03
The experimental evalues=
approach which is known to be numerically exact by definition gives the same results as the matrix power series trace approach, so the slight difference may be attributed to the consequences of inverting the spatial process matrix.
sapply(impacts(SLM_489, evalues=eigs_489), "[", 5)
## direct indirect total
## -5.930731e-03 -1.014744e-05 -5.940879e-03
Impacts are crucial to the interpretation of Durbin models including spatially lagged covariates and models including the spatially lagged response. Tools to calculate impacts and their inferential bases are now available, and should be employed, but as yet some implementation details are under development and ways of presenting results in tabular form have not reached maturity.
We will use the predict()
method for "sarlm"
objects to double-check impacts, here for the pupil-teacher ratio (PTRATIO
). The method was re-written by Martin Gubri based on Goulard, Laurent and Thomas-Agnan (2017). The pred.type=
argument specifies the prediction strategy among those presented in the article.
args(getS3method("predict", "sarlm"))
## function (object, newdata = NULL, listw = NULL, pred.type = "TS",
## all.data = FALSE, zero.policy = NULL, legacy = TRUE, legacy.mixed = FALSE,
## power = NULL, order = 250, tol = .Machine$double.eps^(3/5),
## spChk = NULL, ...)
## NULL
First we’ll increment PTRATIO
by one to show that, using least squares, the mean difference between predictions from the incremented new data and fitted values is equal to the regression coefficient.
nd_489 <- boston_489
nd_489$PTRATIO <- nd_489$PTRATIO + 1
OLS_489 <- lm(form, data=boston_489)
fitted <- predict(OLS_489)
nd_fitted <- predict(OLS_489, newdata=nd_489)
all.equal(unname(coef(OLS_489)[12]), mean(nd_fitted - fitted))
## [1] TRUE
In models including the spatially lagged response, and when the spatial coefficient in different from zero, this is not the case in general, and is why we need impacts()
methods. The difference here is not great, but neither is it zero, and needs to be handled.
fitted <- predict(SLM_489)
## This method assumes the response is known - see manual page
nd_fitted <- predict(SLM_489, newdata=nd_489, listw=lw_q_489, pred.type="TS",
zero.policy=TRUE)
all.equal(unname(coef_SLM_489[13]), mean(nd_fitted - fitted))
## [1] "Mean relative difference: 0.001783081"
In the Boston tracts data set, 17 observations of median house values, the response, are censored. Using these as an example and comparing some pred.type=
variants for the SDEM model and predicting out-of-sample, we can see that there are differences, suggesting that this is a fruitful area for study. There have been a number of alternative proposals for handling missing variables (Gómez-Rubio, Bivand, and Rue 2015; Suesse 2018). Another reason for increasing attention on prediction is that it is fundamental for machine learning approaches, in which prediction for validation and test data sets drives model specification choice. The choice of training and other data sets with dependent spatial data remains an open question, and is certainly not as simple as with independent data.
Here, we’ll list the predictions for the censored tract observations using three different prediction types, taking the exponent to get back to the USD median house values.
nd <- boston_506[is.na(boston_506$median),]
t0 <- exp(predict(SDEM_489, newdata=nd, listw=lw_q, pred.type="TS", zero.policy=TRUE))
suppressWarnings(t1 <- exp(predict(SDEM_489, newdata=nd, listw=lw_q, pred.type="KP2",
zero.policy=TRUE)))
suppressWarnings(t2 <- exp(predict(SDEM_489, newdata=nd, listw=lw_q, pred.type="KP5",
zero.policy=TRUE)))
data.frame(fit_TS=t0[,1], fit_KP2=c(t1), fit_KP5=c(t2),
censored=boston_506$censored[as.integer(attr(t0, "region.id"))])
## fit_TS fit_KP2 fit_KP5 censored
## 13 23912.450 29477.240 28147.151 right
## 14 28125.588 27001.192 28516.159 right
## 15 30553.380 36184.007 32476.053 right
## 17 18518.448 19620.801 18878.244 right
## 43 9563.613 6816.721 7561.353 left
## 50 8371.200 7196.445 7383.139 left
## 312 51477.038 53300.877 54172.605 right
## 313 45920.630 45823.263 47094.833 right
## 314 44195.663 44585.510 45361.371 right
## 317 43427.407 45706.682 45442.461 right
## 337 39878.563 42071.842 41126.925 right
## 346 44708.480 46693.655 46107.827 right
## 355 48187.620 49067.627 48910.651 right
## 376 42881.289 45883.193 44965.674 right
## 408 44293.862 44615.008 45669.977 right
## 418 38210.776 43374.808 41913.830 right
## 434 41647.159 41690.222 42397.968 right
Let us read in again the ACS data set with over 70,000 observations:
library(sf)
df_tracts <- st_read("df_tracts.gpkg")
## Reading layer `df_tracts' from data source
## `/home/rsb/und/uam21/units_2/df_tracts.gpkg' using driver `GPKG'
## Simple feature collection with 71353 features and 28 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.7332 ymin: 24.5447 xmax: -66.9499 ymax: 49.38436
## Geodetic CRS: NAD83
set.ZeroPolicyOption(TRUE)
## [1] FALSE
spdep::set.ZeroPolicyOption(TRUE)
## [1] FALSE
nb_subset <- readRDS("nb_subset.rds")
lw <- spdep::nb2listw(nb_subset, style="W")
In the original study, logarithms of the median income CV were thought to be driven by a slightly broader set of covariates. Here we take logs of the vacancy rate, the elderly rate, the Black and Hispanic rates, the population density, and group quarters population (institutional like prisons, non-institutional like student accommodation https://www.census.gov/newsroom/releases/archives/2010_census/cb11-tps13.html).
tr_form <- log(med_inc_cv) ~ log1p(vacancy_rate) + log1p(old_rate) + log1p(black_rate) + log1p(hisp_rate) + log1p(group_pop) + log1p(dens)
The available covariates are not the same as in the original article, but our focus here is to show that larger data sets can be used in analysis:
lm_mod <- lm(tr_form, data=df_tracts)
summary(lm_mod)
##
## Call:
## lm(formula = tr_form, data = df_tracts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.14822 -0.29445 0.01038 0.30068 2.99263
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.2284926 0.0355078 -175.41 <2e-16 ***
## log1p(vacancy_rate) 0.4039100 0.0248345 16.26 <2e-16 ***
## log1p(old_rate) 0.4069179 0.0040408 100.70 <2e-16 ***
## log1p(black_rate) 0.5478924 0.0110139 49.75 <2e-16 ***
## log1p(hisp_rate) 0.5854893 0.0121825 48.06 <2e-16 ***
## log1p(group_pop) 0.0216907 0.0007901 27.45 <2e-16 ***
## log1p(dens) 0.0354277 0.0018286 19.37 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4526 on 71346 degrees of freedom
## Multiple R-squared: 0.2296, Adjusted R-squared: 0.2295
## F-statistic: 3543 on 6 and 71346 DF, p-value: < 2.2e-16
The least squares residuals show strong positive spatial autocorrelation:
spdep::lm.morantest(lm_mod, lw)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = tr_form, data = df_tracts)
## weights: lw
##
## Moran I statistic standard deviate = 50.637, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 1.117328e-01 -6.347763e-05 4.874291e-06
Adding the spatially lagged covariates in SLX model form might make sense:
SLX <- lmSLX(tr_form, data=df_tracts, listw=lw, zero.policy=TRUE)
summary(SLX)
##
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))),
## data = as.data.frame(x), weights = weights)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.14224 -0.29424 0.01015 0.30057 2.94089
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.091034 0.054824 -111.102 < 2e-16 ***
## log1p.vacancy_rate. 0.204452 0.039726 5.147 2.66e-07 ***
## log1p.old_rate. 0.402543 0.004422 91.026 < 2e-16 ***
## log1p.black_rate. 0.493457 0.027068 18.230 < 2e-16 ***
## log1p.hisp_rate. 0.607203 0.033922 17.900 < 2e-16 ***
## log1p.group_pop. 0.019748 0.000834 23.678 < 2e-16 ***
## log1p.dens. 0.010350 0.004203 2.463 0.01380 *
## lag.log1p.vacancy_rate. 0.374812 0.050936 7.359 1.88e-13 ***
## lag.log1p.old_rate. -0.016907 0.006436 -2.627 0.00861 **
## lag.log1p.black_rate. 0.053985 0.030672 1.760 0.07840 .
## lag.log1p.hisp_rate. -0.039247 0.037392 -1.050 0.29390
## lag.log1p.group_pop. 0.012803 0.001529 8.374 < 2e-16 ***
## lag.log1p.dens. 0.033688 0.004742 7.105 1.22e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4521 on 71340 degrees of freedom
## Multiple R-squared: 0.2315, Adjusted R-squared: 0.2314
## F-statistic: 1791 on 12 and 71340 DF, p-value: < 2.2e-16
The level of residual spatial autocorrelation is effectively unchanged, perhaps suggesting that the spatial footprint of the autocorrelation process(es) may not be well-represented by the chosen definition of tract neighbours:
spdep::lm.morantest(SLX, lw)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1],
## collapse = "+"))), data = as.data.frame(x), weights = weights)
## weights: lw
##
## Moran I statistic standard deviate = 50.563, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 1.115597e-01 -7.287566e-05 4.874367e-06
The linear combinations of unlagged and lagged covariates give these impacts, many pulling the same way:
summary(spatialreg::impacts(SLX))
## Impact measures (SlX, estimable, n-k):
## Direct Indirect Total
## log1p(vacancy_rate) 0.20445207 0.37481174 0.57926381
## log1p(old_rate) 0.40254315 -0.01690728 0.38563587
## log1p(black_rate) 0.49345676 0.05398505 0.54744181
## log1p(hisp_rate) 0.60720345 -0.03924693 0.56795651
## log1p(group_pop) 0.01974814 0.01280282 0.03255096
## log1p(dens) 0.01035030 0.03368823 0.04403853
## ========================================================
## Standard errors:
## Direct Indirect Total
## log1p(vacancy_rate) 0.0397260202 0.050935722 0.032647455
## log1p(old_rate) 0.0044223032 0.006435572 0.006346559
## log1p(black_rate) 0.0270685196 0.030671611 0.012643255
## log1p(hisp_rate) 0.0339216058 0.037391531 0.013710963
## log1p(group_pop) 0.0008340256 0.001528895 0.001507331
## log1p(dens) 0.0042030102 0.004741785 0.002155113
## ========================================================
## Z-values:
## Direct Indirect Total
## log1p(vacancy_rate) 5.146553 7.358524 17.74300
## log1p(old_rate) 91.025678 -2.627160 60.76298
## log1p(black_rate) 18.229913 1.760098 43.29912
## log1p(hisp_rate) 17.900198 -1.049621 41.42353
## log1p(group_pop) 23.678098 8.373907 21.59510
## log1p(dens) 2.462592 7.104546 20.43444
##
## p-values:
## Direct Indirect Total
## log1p(vacancy_rate) 2.6532e-07 1.8585e-13 < 2.22e-16
## log1p(old_rate) < 2.22e-16 0.0086101 < 2.22e-16
## log1p(black_rate) < 2.22e-16 0.0783911 < 2.22e-16
## log1p(hisp_rate) < 2.22e-16 0.2938925 < 2.22e-16
## log1p(group_pop) < 2.22e-16 < 2.22e-16 < 2.22e-16
## log1p(dens) 0.013794 1.2073e-12 < 2.22e-16
summary(df_tracts$tot_pop)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 501 2925 4028 4283 5341 37452
quantile(df_tracts$group_pop, seq(0, 1, 0.1))
## 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
## 0 0 0 0 2 7 16 40 91 187 19496
plot(df_tracts$group_pop, df_tracts$group_pop/df_tracts$tot_pop)
plot(df_tracts$tot_pop, df_tracts$group_pop/df_tracts$tot_pop)
Weighted least squares use the range of the weights to assign tracts with larger populations less residual variance, but here the group quarters populations play in too:
lm_modw <- lm(tr_form, data=df_tracts, weights=tot_pop)
summary(lm_modw)
##
## Call:
## lm(formula = tr_form, data = df_tracts, weights = tot_pop)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -165.025 -18.290 0.634 18.582 206.188
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.2252852 0.0349852 -177.94 <2e-16 ***
## log1p(vacancy_rate) 0.5567749 0.0272130 20.46 <2e-16 ***
## log1p(old_rate) 0.4053828 0.0040674 99.67 <2e-16 ***
## log1p(black_rate) 0.5058137 0.0113856 44.43 <2e-16 ***
## log1p(hisp_rate) 0.5708457 0.0114385 49.91 <2e-16 ***
## log1p(group_pop) 0.0239794 0.0007418 32.33 <2e-16 ***
## log1p(dens) 0.0336080 0.0017827 18.85 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28.5 on 71346 degrees of freedom
## Multiple R-squared: 0.2204, Adjusted R-squared: 0.2203
## F-statistic: 3361 on 6 and 71346 DF, p-value: < 2.2e-16
spdep::lm.morantest(lm_modw, lw)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = tr_form, data = df_tracts, weights = tot_pop)
## weights: lw
##
## Moran I statistic standard deviate = 49.407, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 1.090160e-01 -6.281631e-05 4.874287e-06
We can try the same with other models, such as SLX, but again with little change:
SLXw <- lmSLX(tr_form, weights=tot_pop, data=df_tracts, listw=lw, zero.policy=TRUE)
summary(SLXw)
##
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))),
## data = as.data.frame(x), weights = weights)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -164.762 -18.250 0.628 18.602 212.170
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.2010197 0.0542376 -114.331 < 2e-16 ***
## log1p.vacancy_rate. 0.2900525 0.0426211 6.805 1.02e-11 ***
## log1p.old_rate. 0.3983487 0.0044799 88.919 < 2e-16 ***
## log1p.black_rate. 0.4647722 0.0276407 16.815 < 2e-16 ***
## log1p.hisp_rate. 0.5933632 0.0325690 18.219 < 2e-16 ***
## log1p.group_pop. 0.0212821 0.0007877 27.019 < 2e-16 ***
## log1p.dens. 0.0168140 0.0040944 4.107 4.02e-05 ***
## lag.log1p.vacancy_rate. 0.4325027 0.0518089 8.348 < 2e-16 ***
## lag.log1p.old_rate. -0.0019333 0.0065644 -0.295 0.768
## lag.log1p.black_rate. 0.0407380 0.0311804 1.307 0.191
## lag.log1p.hisp_rate. -0.0310037 0.0358891 -0.864 0.388
## lag.log1p.group_pop. 0.0164220 0.0014939 10.993 < 2e-16 ***
## lag.log1p.dens. 0.0231145 0.0046590 4.961 7.02e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28.45 on 71340 degrees of freedom
## Multiple R-squared: 0.2228, Adjusted R-squared: 0.2226
## F-statistic: 1704 on 12 and 71340 DF, p-value: < 2.2e-16
spdep::lm.morantest(SLXw, lw)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1],
## collapse = "+"))), data = as.data.frame(x), weights = weights)
## weights: lw
##
## Moran I statistic standard deviate = 49.483, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 1.091760e-01 -7.215477e-05 4.874344e-06
summary(spatialreg::impacts(SLXw))
## Impact measures (SlX, estimable, n-k):
## Direct Indirect Total
## log1p(vacancy_rate) 0.29005246 0.432502675 0.72255514
## log1p(old_rate) 0.39834869 -0.001933313 0.39641538
## log1p(black_rate) 0.46477224 0.040737989 0.50551022
## log1p(hisp_rate) 0.59336317 -0.031003701 0.56235946
## log1p(group_pop) 0.02128214 0.016422044 0.03770419
## log1p(dens) 0.01681396 0.023114529 0.03992849
## ========================================================
## Standard errors:
## Direct Indirect Total
## log1p(vacancy_rate) 0.0426210645 0.051808861 0.033980108
## log1p(old_rate) 0.0044798951 0.006564424 0.006325152
## log1p(black_rate) 0.0276406965 0.031180367 0.012944562
## log1p(hisp_rate) 0.0325690346 0.035889146 0.012845410
## log1p(group_pop) 0.0007876827 0.001493860 0.001457527
## log1p(dens) 0.0040943629 0.004658979 0.002114653
## ========================================================
## Z-values:
## Direct Indirect Total
## log1p(vacancy_rate) 6.805378 8.3480444 21.26406
## log1p(old_rate) 88.919201 -0.2945138 62.67286
## log1p(black_rate) 16.814780 1.3065269 39.05194
## log1p(hisp_rate) 18.218629 -0.8638740 43.77902
## log1p(group_pop) 27.018677 10.9930292 25.86861
## log1p(dens) 4.106612 4.9612865 18.88181
##
## p-values:
## Direct Indirect Total
## log1p(vacancy_rate) 1.0078e-11 < 2.22e-16 < 2.22e-16
## log1p(old_rate) < 2.22e-16 0.76837 < 2.22e-16
## log1p(black_rate) < 2.22e-16 0.19137 < 2.22e-16
## log1p(hisp_rate) < 2.22e-16 0.38766 < 2.22e-16
## log1p(group_pop) < 2.22e-16 < 2.22e-16 < 2.22e-16
## log1p(dens) 4.0151e-05 7.0028e-07 < 2.22e-16
We can also interact the ESRI reliability classes with the other covariates:
df_tracts$mi_cv_esri <- cut(df_tracts$med_inc_cv, c(0, 0.12, 0.40, Inf), labels=c("High", "Medium", "Low"), right=TRUE, include.lowest=TRUE, ordered_result=TRUE)
ctr_form <- update(tr_form, . ~ mi_cv_esri/(. -1))
lm_modc <- lm(ctr_form, data=df_tracts)
summary(lm_modc)
##
## Call:
## lm(formula = ctr_form, data = df_tracts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.85057 -0.18982 0.01306 0.22555 1.61825
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## mi_cv_esriHigh -3.9650108 0.0310620 -127.648 < 2e-16
## mi_cv_esriMedium -3.6373216 0.0443373 -82.037 < 2e-16
## mi_cv_esriLow -1.4641573 0.3551794 -4.122 3.76e-05
## mi_cv_esriHigh:log1p(vacancy_rate) 0.1618894 0.0211417 7.657 1.92e-14
## mi_cv_esriMedium:log1p(vacancy_rate) 0.1666862 0.0281628 5.919 3.26e-09
## mi_cv_esriLow:log1p(vacancy_rate) 0.0241005 0.1577771 0.153 0.87860
## mi_cv_esriHigh:log1p(old_rate) 0.1402792 0.0035544 39.466 < 2e-16
## mi_cv_esriMedium:log1p(old_rate) 0.1919496 0.0048860 39.286 < 2e-16
## mi_cv_esriLow:log1p(old_rate) 0.0759502 0.0359904 2.110 0.03484
## mi_cv_esriHigh:log1p(black_rate) 0.2260342 0.0104547 21.620 < 2e-16
## mi_cv_esriMedium:log1p(black_rate) 0.1641366 0.0111635 14.703 < 2e-16
## mi_cv_esriLow:log1p(black_rate) 0.0313052 0.0677407 0.462 0.64399
## mi_cv_esriHigh:log1p(hisp_rate) 0.2724036 0.0109594 24.856 < 2e-16
## mi_cv_esriMedium:log1p(hisp_rate) 0.2057883 0.0130134 15.814 < 2e-16
## mi_cv_esriLow:log1p(hisp_rate) 0.0323785 0.0910110 0.356 0.72202
## mi_cv_esriHigh:log1p(group_pop) 0.0111474 0.0006592 16.911 < 2e-16
## mi_cv_esriMedium:log1p(group_pop) 0.0097060 0.0009280 10.459 < 2e-16
## mi_cv_esriLow:log1p(group_pop) 0.0171415 0.0060395 2.838 0.00454
## mi_cv_esriHigh:log1p(dens) 0.0219977 0.0016064 13.694 < 2e-16
## mi_cv_esriMedium:log1p(dens) 0.0066425 0.0019684 3.375 0.00074
## mi_cv_esriLow:log1p(dens) -0.0055058 0.0127856 -0.431 0.66674
##
## mi_cv_esriHigh ***
## mi_cv_esriMedium ***
## mi_cv_esriLow ***
## mi_cv_esriHigh:log1p(vacancy_rate) ***
## mi_cv_esriMedium:log1p(vacancy_rate) ***
## mi_cv_esriLow:log1p(vacancy_rate)
## mi_cv_esriHigh:log1p(old_rate) ***
## mi_cv_esriMedium:log1p(old_rate) ***
## mi_cv_esriLow:log1p(old_rate) *
## mi_cv_esriHigh:log1p(black_rate) ***
## mi_cv_esriMedium:log1p(black_rate) ***
## mi_cv_esriLow:log1p(black_rate)
## mi_cv_esriHigh:log1p(hisp_rate) ***
## mi_cv_esriMedium:log1p(hisp_rate) ***
## mi_cv_esriLow:log1p(hisp_rate)
## mi_cv_esriHigh:log1p(group_pop) ***
## mi_cv_esriMedium:log1p(group_pop) ***
## mi_cv_esriLow:log1p(group_pop) **
## mi_cv_esriHigh:log1p(dens) ***
## mi_cv_esriMedium:log1p(dens) ***
## mi_cv_esriLow:log1p(dens)
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3054 on 71332 degrees of freedom
## Multiple R-squared: 0.9837, Adjusted R-squared: 0.9837
## F-statistic: 2.054e+05 on 21 and 71332 DF, p-value: < 2.2e-16
There is still a lot of residual autocorrelation:
spdep::lm.morantest(lm_modc, lw)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = ctr_form, data = df_tracts)
## weights: lw
##
## Moran I statistic standard deviate = 26.685, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 5.883956e-02 -7.266655e-05 4.873925e-06
anova(lm_modc)
## Analysis of Variance Table
##
## Response: log(med_inc_cv)
## Df Sum Sq Mean Sq F value Pr(>F)
## mi_cv_esri 3 401888 133963 1.4359e+06 < 2.2e-16 ***
## mi_cv_esri:log1p(vacancy_rate) 3 53 18 1.9014e+02 < 2.2e-16 ***
## mi_cv_esri:log1p(old_rate) 3 245 82 8.7423e+02 < 2.2e-16 ***
## mi_cv_esri:log1p(black_rate) 3 94 31 3.3653e+02 < 2.2e-16 ***
## mi_cv_esri:log1p(hisp_rate) 3 141 47 5.0366e+02 < 2.2e-16 ***
## mi_cv_esri:log1p(group_pop) 3 37 12 1.3377e+02 < 2.2e-16 ***
## mi_cv_esri:log1p(dens) 3 19 6 6.6365e+01 < 2.2e-16 ***
## Residuals 71332 6655 0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conditioning on ESRI reliability yields an improvement in fit:
anova(lm_mod, lm_modc)
## Analysis of Variance Table
##
## Model 1: log(med_inc_cv) ~ log1p(vacancy_rate) + log1p(old_rate) + log1p(black_rate) +
## log1p(hisp_rate) + log1p(group_pop) + log1p(dens)
## Model 2: log(med_inc_cv) ~ mi_cv_esri + mi_cv_esri:log1p(vacancy_rate) +
## mi_cv_esri:log1p(old_rate) + mi_cv_esri:log1p(black_rate) +
## mi_cv_esri:log1p(hisp_rate) + mi_cv_esri:log1p(group_pop) +
## mi_cv_esri:log1p(dens) - 1
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 71346 14617.9
## 2 71332 6654.7 14 7963.1 6096.9 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can also fit the SLX model on the conditioned representation:
SLXc <- lmSLX(ctr_form, data=df_tracts, listw=lw, zero.policy=TRUE)
summary(SLXc)
##
## Call:
## lm(formula = formula(paste("y ~ 0 + ", paste(colnames(x), collapse = "+"))),
## data = as.data.frame(x), weights = weights)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.91411 -0.18856 0.01429 0.22304 1.61503
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## mi_cv_esriHigh -4.0901041 0.0814612 -50.209 < 2e-16
## mi_cv_esriMedium -3.7875996 0.0877375 -43.170 < 2e-16
## mi_cv_esriLow -1.7110394 0.3614910 -4.733 2.21e-06
## mi_cv_esriHigh.log1p.vacancy_rate. 0.0863174 0.0302771 2.851 0.004361
## mi_cv_esriMedium.log1p.vacancy_rate. 0.0618700 0.0346242 1.787 0.073958
## mi_cv_esriLow.log1p.vacancy_rate. -0.0724166 0.1579352 -0.459 0.646580
## mi_cv_esriHigh.log1p.old_rate. 0.1448400 0.0038209 37.908 < 2e-16
## mi_cv_esriMedium.log1p.old_rate. 0.1969933 0.0050701 38.854 < 2e-16
## mi_cv_esriLow.log1p.old_rate. 0.0875928 0.0358438 2.444 0.014538
## mi_cv_esriHigh.log1p.black_rate. 0.1811410 0.0198064 9.146 < 2e-16
## mi_cv_esriMedium.log1p.black_rate. 0.1495776 0.0199107 7.512 5.87e-14
## mi_cv_esriLow.log1p.black_rate. 0.0625038 0.0693053 0.902 0.367133
## mi_cv_esriHigh.log1p.hisp_rate. 0.2830902 0.0239404 11.825 < 2e-16
## mi_cv_esriMedium.log1p.hisp_rate. 0.2470295 0.0249812 9.889 < 2e-16
## mi_cv_esriLow.log1p.hisp_rate. 0.1345804 0.0931431 1.445 0.148496
## mi_cv_esriHigh.log1p.group_pop. 0.0106472 0.0006818 15.617 < 2e-16
## mi_cv_esriMedium.log1p.group_pop. 0.0088891 0.0009425 9.431 < 2e-16
## mi_cv_esriLow.log1p.group_pop. 0.0173298 0.0060110 2.883 0.003940
## mi_cv_esriHigh.log1p.dens. 0.0155850 0.0029777 5.234 1.66e-07
## mi_cv_esriMedium.log1p.dens. 0.0036302 0.0033277 1.091 0.275324
## mi_cv_esriLow.log1p.dens. -0.0082932 0.0130541 -0.635 0.525240
## lag.mi_cv_esriHigh 0.3598838 0.0950003 3.788 0.000152
## lag.mi_cv_esriMedium 0.9177353 0.1195552 7.676 1.66e-14
## lag.mi_cv_esriLow 0.6935848 0.8284068 0.837 0.402454
## lag.mi_cv_esriHigh.log1p.vacancy_rate. 0.0416962 0.0417443 0.999 0.317871
## lag.mi_cv_esriMedium.log1p.vacancy_rate. 0.2563055 0.0580503 4.415 1.01e-05
## lag.mi_cv_esriLow.log1p.vacancy_rate. -0.1267525 0.3680428 -0.344 0.730550
## lag.mi_cv_esriHigh.log1p.old_rate. -0.0370668 0.0068030 -5.449 5.09e-08
## lag.mi_cv_esriMedium.log1p.old_rate. -0.0757979 0.0103622 -7.315 2.60e-13
## lag.mi_cv_esriLow.log1p.old_rate. -0.0308527 0.0840665 -0.367 0.713617
## lag.mi_cv_esriHigh.log1p.black_rate. 0.0517898 0.0251535 2.059 0.039502
## lag.mi_cv_esriMedium.log1p.black_rate. -0.1170101 0.0273028 -4.286 1.82e-05
## lag.mi_cv_esriLow.log1p.black_rate. -0.2485985 0.1625651 -1.529 0.126213
## lag.mi_cv_esriHigh.log1p.hisp_rate. -0.0012532 0.0291621 -0.043 0.965722
## lag.mi_cv_esriMedium.log1p.hisp_rate. -0.2254311 0.0332033 -6.789 1.13e-11
## lag.mi_cv_esriLow.log1p.hisp_rate. -0.2781210 0.2057159 -1.352 0.176390
## lag.mi_cv_esriHigh.log1p.group_pop. 0.0047556 0.0013283 3.580 0.000344
## lag.mi_cv_esriMedium.log1p.group_pop. 0.0019507 0.0020011 0.975 0.329666
## lag.mi_cv_esriLow.log1p.group_pop. -0.0079431 0.0140328 -0.566 0.571370
## lag.mi_cv_esriHigh.log1p.dens. 0.0123808 0.0038052 3.254 0.001140
## lag.mi_cv_esriMedium.log1p.dens. -0.0111389 0.0044558 -2.500 0.012427
## lag.mi_cv_esriLow.log1p.dens. -0.0082363 0.0299712 -0.275 0.783465
##
## mi_cv_esriHigh ***
## mi_cv_esriMedium ***
## mi_cv_esriLow ***
## mi_cv_esriHigh.log1p.vacancy_rate. **
## mi_cv_esriMedium.log1p.vacancy_rate. .
## mi_cv_esriLow.log1p.vacancy_rate.
## mi_cv_esriHigh.log1p.old_rate. ***
## mi_cv_esriMedium.log1p.old_rate. ***
## mi_cv_esriLow.log1p.old_rate. *
## mi_cv_esriHigh.log1p.black_rate. ***
## mi_cv_esriMedium.log1p.black_rate. ***
## mi_cv_esriLow.log1p.black_rate.
## mi_cv_esriHigh.log1p.hisp_rate. ***
## mi_cv_esriMedium.log1p.hisp_rate. ***
## mi_cv_esriLow.log1p.hisp_rate.
## mi_cv_esriHigh.log1p.group_pop. ***
## mi_cv_esriMedium.log1p.group_pop. ***
## mi_cv_esriLow.log1p.group_pop. **
## mi_cv_esriHigh.log1p.dens. ***
## mi_cv_esriMedium.log1p.dens.
## mi_cv_esriLow.log1p.dens.
## lag.mi_cv_esriHigh ***
## lag.mi_cv_esriMedium ***
## lag.mi_cv_esriLow
## lag.mi_cv_esriHigh.log1p.vacancy_rate.
## lag.mi_cv_esriMedium.log1p.vacancy_rate. ***
## lag.mi_cv_esriLow.log1p.vacancy_rate.
## lag.mi_cv_esriHigh.log1p.old_rate. ***
## lag.mi_cv_esriMedium.log1p.old_rate. ***
## lag.mi_cv_esriLow.log1p.old_rate.
## lag.mi_cv_esriHigh.log1p.black_rate. *
## lag.mi_cv_esriMedium.log1p.black_rate. ***
## lag.mi_cv_esriLow.log1p.black_rate.
## lag.mi_cv_esriHigh.log1p.hisp_rate.
## lag.mi_cv_esriMedium.log1p.hisp_rate. ***
## lag.mi_cv_esriLow.log1p.hisp_rate.
## lag.mi_cv_esriHigh.log1p.group_pop. ***
## lag.mi_cv_esriMedium.log1p.group_pop.
## lag.mi_cv_esriLow.log1p.group_pop.
## lag.mi_cv_esriHigh.log1p.dens. **
## lag.mi_cv_esriMedium.log1p.dens. *
## lag.mi_cv_esriLow.log1p.dens.
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3035 on 71311 degrees of freedom
## Multiple R-squared: 0.9839, Adjusted R-squared: 0.9839
## F-statistic: 1.041e+05 on 42 and 71311 DF, p-value: < 2.2e-16
spdep::lm.morantest(SLXc, lw)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = formula(paste("y ~ 0 + ", paste(colnames(x),
## collapse = "+"))), data = as.data.frame(x), weights = weights)
## weights: lw
##
## Moran I statistic standard deviate = 21.462, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 4.722938e-02 -1.493084e-04 4.873462e-06
summary(spatialreg::impacts(SLXc))
## Impact measures (SlX, estimable, n-k):
## Direct Indirect Total
## mi_cv_esriHigh -4.090104132 0.359883802 -3.730220329
## mi_cv_esriMedium -3.787599585 0.917735347 -2.869864238
## mi_cv_esriLow -1.711039382 0.693584754 -1.017454628
## mi_cv_esriHigh:log1p(vacancy_rate) 0.086317399 0.041696231 0.128013630
## mi_cv_esriMedium:log1p(vacancy_rate) 0.061870035 0.256305459 0.318175495
## mi_cv_esriLow:log1p(vacancy_rate) -0.072416561 -0.126752462 -0.199169022
## mi_cv_esriHigh:log1p(old_rate) 0.144840014 -0.037066793 0.107773221
## mi_cv_esriMedium:log1p(old_rate) 0.196993270 -0.075797877 0.121195393
## mi_cv_esriLow:log1p(old_rate) 0.087592829 -0.030852720 0.056740109
## mi_cv_esriHigh:log1p(black_rate) 0.181140990 0.051789817 0.232930806
## mi_cv_esriMedium:log1p(black_rate) 0.149577592 -0.117010089 0.032567503
## mi_cv_esriLow:log1p(black_rate) 0.062503849 -0.248598460 -0.186094611
## mi_cv_esriHigh:log1p(hisp_rate) 0.283090232 -0.001253238 0.281836994
## mi_cv_esriMedium:log1p(hisp_rate) 0.247029465 -0.225431066 0.021598398
## mi_cv_esriLow:log1p(hisp_rate) 0.134580420 -0.278120971 -0.143540551
## mi_cv_esriHigh:log1p(group_pop) 0.010647239 0.004755558 0.015402797
## mi_cv_esriMedium:log1p(group_pop) 0.008889058 0.001950689 0.010839747
## mi_cv_esriLow:log1p(group_pop) 0.017329841 -0.007943069 0.009386772
## mi_cv_esriHigh:log1p(dens) 0.015585033 0.012380828 0.027965861
## mi_cv_esriMedium:log1p(dens) 0.003630192 -0.011138902 -0.007508710
## mi_cv_esriLow:log1p(dens) -0.008293167 -0.008236288 -0.016529455
## ========================================================
## Standard errors:
## Direct Indirect Total
## mi_cv_esriHigh 0.0814611959 0.095000270 0.057699914
## mi_cv_esriMedium 0.0877375277 0.119555240 0.096886235
## mi_cv_esriLow 0.3614910215 0.828406833 0.887832679
## mi_cv_esriHigh:log1p(vacancy_rate) 0.0302770553 0.041744275 0.032787516
## mi_cv_esriMedium:log1p(vacancy_rate) 0.0346242226 0.058050256 0.053524495
## mi_cv_esriLow:log1p(vacancy_rate) 0.1579352418 0.368042774 0.396163158
## mi_cv_esriHigh:log1p(old_rate) 0.0038208790 0.006803000 0.006666668
## mi_cv_esriMedium:log1p(old_rate) 0.0050701074 0.010362231 0.010737603
## mi_cv_esriLow:log1p(old_rate) 0.0358438232 0.084066540 0.090446739
## mi_cv_esriHigh:log1p(black_rate) 0.0198064357 0.025153463 0.017097134
## mi_cv_esriMedium:log1p(black_rate) 0.0199107463 0.027302820 0.020220798
## mi_cv_esriLow:log1p(black_rate) 0.0693052818 0.162565109 0.172245086
## mi_cv_esriHigh:log1p(hisp_rate) 0.0239403788 0.029162060 0.018012159
## mi_cv_esriMedium:log1p(hisp_rate) 0.0249812349 0.033203320 0.024192830
## mi_cv_esriLow:log1p(hisp_rate) 0.0931430877 0.205715885 0.222680660
## mi_cv_esriHigh:log1p(group_pop) 0.0006817651 0.001328308 0.001361229
## mi_cv_esriMedium:log1p(group_pop) 0.0009425010 0.002001129 0.002091060
## mi_cv_esriLow:log1p(group_pop) 0.0060109804 0.014032752 0.015248246
## mi_cv_esriHigh:log1p(dens) 0.0029777193 0.003805163 0.002574415
## mi_cv_esriMedium:log1p(dens) 0.0033277284 0.004455829 0.003476595
## mi_cv_esriLow:log1p(dens) 0.0130541070 0.029971163 0.032094623
## ========================================================
## Z-values:
## Direct Indirect Total
## mi_cv_esriHigh -50.2092326 3.78823977 -64.6486294
## mi_cv_esriMedium -43.1696639 7.67624530 -29.6209699
## mi_cv_esriLow -4.7332832 0.83725137 -1.1459982
## mi_cv_esriHigh:log1p(vacancy_rate) 2.8509179 0.99884909 3.9043406
## mi_cv_esriMedium:log1p(vacancy_rate) 1.7869003 4.41523394 5.9444838
## mi_cv_esriLow:log1p(vacancy_rate) -0.4585206 -0.34439601 -0.5027449
## mi_cv_esriHigh:log1p(old_rate) 37.9075113 -5.44859520 16.1659789
## mi_cv_esriMedium:log1p(old_rate) 38.8538652 -7.31482238 11.2870067
## mi_cv_esriLow:log1p(old_rate) 2.4437357 -0.36700357 0.6273317
## mi_cv_esriHigh:log1p(black_rate) 9.1455622 2.05895371 13.6239681
## mi_cv_esriMedium:log1p(black_rate) 7.5124051 -4.28564119 1.6105943
## mi_cv_esriLow:log1p(black_rate) 0.9018627 -1.52922395 -1.0804059
## mi_cv_esriHigh:log1p(hisp_rate) 11.8248017 -0.04297494 15.6470416
## mi_cv_esriMedium:log1p(hisp_rate) 9.8886010 -6.78941336 0.8927603
## mi_cv_esriLow:log1p(hisp_rate) 1.4448782 -1.35196643 -0.6446027
## mi_cv_esriHigh:log1p(group_pop) 15.6171649 3.58016342 11.3153636
## mi_cv_esriMedium:log1p(group_pop) 9.4313516 0.97479398 5.1838527
## mi_cv_esriLow:log1p(group_pop) 2.8830307 -0.56603789 0.6155968
## mi_cv_esriHigh:log1p(dens) 5.2338823 3.25369239 10.8629944
## mi_cv_esriMedium:log1p(dens) 1.0908920 -2.49984973 -2.1597888
## mi_cv_esriLow:log1p(dens) -0.6352918 -0.27480708 -0.5150226
##
## p-values:
## Direct Indirect Total
## mi_cv_esriHigh < 2.22e-16 0.00015172 < 2.22e-16
## mi_cv_esriMedium < 2.22e-16 1.6431e-14 < 2.22e-16
## mi_cv_esriLow 2.2092e-06 0.40245129 0.251796
## mi_cv_esriHigh:log1p(vacancy_rate) 0.0043593 0.31786780 9.4483e-05
## mi_cv_esriMedium:log1p(vacancy_rate) 0.0739536 1.0090e-05 2.7733e-09
## mi_cv_esriLow:log1p(vacancy_rate) 0.6465785 0.73054849 0.615144
## mi_cv_esriHigh:log1p(old_rate) < 2.22e-16 5.0769e-08 < 2.22e-16
## mi_cv_esriMedium:log1p(old_rate) < 2.22e-16 2.5779e-13 < 2.22e-16
## mi_cv_esriLow:log1p(old_rate) 0.0145361 0.71361636 0.530442
## mi_cv_esriHigh:log1p(black_rate) < 2.22e-16 0.03949867 < 2.22e-16
## mi_cv_esriMedium:log1p(black_rate) 5.7954e-14 1.8221e-05 0.107268
## mi_cv_esriLow:log1p(black_rate) 0.3671298 0.12620893 0.279961
## mi_cv_esriHigh:log1p(hisp_rate) < 2.22e-16 0.96572151 < 2.22e-16
## mi_cv_esriMedium:log1p(hisp_rate) < 2.22e-16 1.1259e-11 0.371986
## mi_cv_esriLow:log1p(hisp_rate) 0.1484921 0.17638605 0.519185
## mi_cv_esriHigh:log1p(group_pop) < 2.22e-16 0.00034338 < 2.22e-16
## mi_cv_esriMedium:log1p(group_pop) < 2.22e-16 0.32966247 2.1735e-07
## mi_cv_esriLow:log1p(group_pop) 0.0039387 0.57136803 0.538161
## mi_cv_esriHigh:log1p(dens) 1.6599e-07 0.00113916 < 2.22e-16
## mi_cv_esriMedium:log1p(dens) 0.2753204 0.01242460 0.030789
## mi_cv_esriLow:log1p(dens) 0.5252381 0.78346446 0.606537
Again, the conditioned representation yields an improved fit, but of course ESRI reliability is directly cut from the response (CV):
cat(capture.output(print(anova(SLX, SLXc)))[c(1, 2, 27:31)], sep="\n") # anova(SLX, SLXc)
## Analysis of Variance Table
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 71340 14581.4
## 2 71311 6568.9 29 8012.5 2999.4 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With a larger number of observations, using eigenvalues to calculate the Jacobian is not possible, so we use a pre-computed Cholesky decomposition:
SEM <- spatialreg::errorsarlm(tr_form, data=df_tracts, listw=lw, method="Matrix", zero.policy=TRUE)
apply(SEM$timings, 2, sum)
## user.self elapsed
## 4.424 4.497
summary(SEM, Hausman=TRUE, Nagelkerke=TRUE)
##
## Call:spatialreg::errorsarlm(formula = tr_form, data = df_tracts, listw = lw,
## method = "Matrix", zero.policy = TRUE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.161891 -0.287439 0.010315 0.296182 2.979651
##
## Type: error
## Regions with no neighbours included:
## 12745 26628 29077 29120 32398 32775 43104 46344 46390 52303 58113 58261 70000 70055 70058 70457 70458
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.25184400 0.03703706 -168.800 < 2.2e-16
## log1p(vacancy_rate) 0.35134711 0.02829437 12.418 < 2.2e-16
## log1p(old_rate) 0.41108547 0.00418162 98.308 < 2.2e-16
## log1p(black_rate) 0.54672194 0.01341739 40.747 < 2.2e-16
## log1p(hisp_rate) 0.59289677 0.01501856 39.478 < 2.2e-16
## log1p(group_pop) 0.02045655 0.00080153 25.522 < 2.2e-16
## log1p(dens) 0.03120040 0.00220509 14.149 < 2.2e-16
##
## Lambda: 0.25388, LR test value: 2082.7, p-value: < 2.22e-16
## Approximate (numerical Hessian) standard error: 0.0079519
## z-value: 31.927, p-value: < 2.22e-16
## Wald statistic: 1019.3, p-value: < 2.22e-16
##
## Log likelihood: -43642.84 for error model
## ML residual variance (sigma squared): 0.19674, (sigma: 0.44356)
## Nagelkerke pseudo-R-squared: 0.25173
## Number of observations: 71353
## Number of parameters estimated: 9
## AIC: 87304, (AIC for lm: 89384)
## Hausman test: 171.45, df: 7, p-value: < 2.22e-16
We can also use single-chain MCMC with default 2500 draws, of which 500 are treated as burn-in and discarded; much of the time is spent on pre-computing a coarse grid of LU-decomposition Jacobian terms:
set.seed(1)
BSEM <- spatialreg::spBreg_err(tr_form, data=df_tracts, listw=lw, zero.policy=TRUE)
apply(attr(BSEM, "timings")[,c(1,3)], 2, sum)
## user.self elapsed
## 115.015 116.260
As we know, the model MCMC output is held in an "mcmc"
object defined in coda:
library(coda)
summary(BSEM)
##
## Iterations = 501:2500
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 2000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## (Intercept) -6.25240 0.0368774 8.246e-04 8.246e-04
## log1p(vacancy_rate) 0.35138 0.0288813 6.458e-04 6.458e-04
## log1p(old_rate) 0.41115 0.0041649 9.313e-05 9.313e-05
## log1p(black_rate) 0.54719 0.0137979 3.085e-04 3.085e-04
## log1p(hisp_rate) 0.59254 0.0154012 3.444e-04 3.444e-04
## log1p(group_pop) 0.02045 0.0007997 1.788e-05 1.929e-05
## log1p(dens) 0.03117 0.0022222 4.969e-05 4.969e-05
## lambda 0.25291 0.0053490 1.196e-04 1.196e-04
## sige 0.20019 0.0012118 2.710e-05 2.710e-05
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## (Intercept) -6.32307 -6.27707 -6.25300 -6.22796 -6.17848
## log1p(vacancy_rate) 0.29587 0.33131 0.35145 0.37149 0.40597
## log1p(old_rate) 0.40290 0.40839 0.41116 0.41394 0.41910
## log1p(black_rate) 0.51979 0.53798 0.54742 0.55630 0.57370
## log1p(hisp_rate) 0.56259 0.58228 0.59234 0.60257 0.62334
## log1p(group_pop) 0.01887 0.01990 0.02045 0.02099 0.02201
## log1p(dens) 0.02684 0.02966 0.03119 0.03271 0.03541
## lambda 0.24262 0.24962 0.25263 0.25663 0.26363
## sige 0.19800 0.19944 0.20018 0.20088 0.20234
Diagnostic tests from coda can also be used:
raftery.diag(BSEM, r=0.01)
##
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.01
## Probability (s) = 0.95
##
## Burn-in Total Lower bound Dependence
## (M) (N) (Nmin) factor (I)
## (Intercept) 2 930 937 0.993
## log1p(vacancy_rate) 2 950 937 1.010
## log1p(old_rate) 2 930 937 0.993
## log1p(black_rate) 2 913 937 0.974
## log1p(hisp_rate) 2 892 937 0.952
## log1p(group_pop) 2 930 937 0.993
## log1p(dens) 2 950 937 1.010
## lambda 1 1190 937 1.270
## sige 2 930 937 0.993
plot(BSEM, trace=FALSE)
We can also use weights with SEM and conditioned SEM models:
SEMw <- spatialreg::errorsarlm(tr_form, weights=tot_pop, data=df_tracts, listw=lw, method="Matrix", zero.policy=TRUE)
apply(SEMw$timings, 2, sum)
## user.self elapsed
## 4.089 4.101
summary(SEMw, Hausman=TRUE, Nagelkerke=TRUE)
##
## Call:spatialreg::errorsarlm(formula = tr_form, data = df_tracts, listw = lw,
## weights = tot_pop, method = "Matrix", zero.policy = TRUE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1721397 -0.2875481 0.0096908 0.2949229 2.9755875
##
## Type: error
## Regions with no neighbours included:
## 12745 26628 29077 29120 32398 32775 43104 46344 46390 52303 58113 58261 70000 70055 70058 70457 70458
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.25080288 0.03640540 -171.700 < 2.2e-16
## log1p(vacancy_rate) 0.49017557 0.03116976 15.726 < 2.2e-16
## log1p(old_rate) 0.41003992 0.00422318 97.093 < 2.2e-16
## log1p(black_rate) 0.50494215 0.01384960 36.459 < 2.2e-16
## log1p(hisp_rate) 0.57340744 0.01406260 40.775 < 2.2e-16
## log1p(group_pop) 0.02242876 0.00075294 29.788 < 2.2e-16
## log1p(dens) 0.02993784 0.00213535 14.020 < 2.2e-16
##
## Lambda: 0.24966, LR test value: 2032, p-value: < 2.22e-16
## Approximate (numerical Hessian) standard error: 0.0038204
## z-value: 65.349, p-value: < 2.22e-16
## Wald statistic: 4270.5, p-value: < 2.22e-16
##
## Log likelihood: -44437.41 for error model
## ML residual variance (sigma squared): 780.56, (sigma: 27.938)
## Nagelkerke pseudo-R-squared: 0.23488
## Number of observations: 71353
## Number of parameters estimated: 9
## AIC: 88893, (AIC for lm: 90923)
## Hausman test: 208.19, df: 7, p-value: < 2.22e-16
SEMc <- spatialreg::errorsarlm(ctr_form, weights=tot_pop, data=df_tracts, listw=lw, method="Matrix", zero.policy=TRUE)
apply(SEMc$timings, 2, sum)
## user.self elapsed
## 16.977 17.148
summary(SEMc, Hausman=TRUE, Nagelkerke=TRUE)
##
## Call:spatialreg::errorsarlm(formula = ctr_form, data = df_tracts,
## listw = lw, weights = tot_pop, method = "Matrix", zero.policy = TRUE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.875210 -0.189846 0.010025 0.218619 1.589451
##
## Type: error
## Regions with no neighbours included:
## 12745 26628 29077 29120 32398 32775 43104 46344 46390 52303 58113 58261 70000 70055 70058 70457 70458
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value
## mi_cv_esriHigh -4.50194913 0.03081245 -146.1081
## mi_cv_esriMedium -3.52293575 0.05244773 -67.1704
## mi_cv_esriLow -1.51636186 0.50068998 -3.0285
## mi_cv_esriHigh:log1p(vacancy_rate) 0.24662900 0.02483581 9.9304
## mi_cv_esriMedium:log1p(vacancy_rate) 0.17373501 0.03654076 4.7546
## mi_cv_esriLow:log1p(vacancy_rate) -0.08408966 0.26643804 -0.3156
## mi_cv_esriHigh:log1p(old_rate) 0.20108446 0.00359308 55.9644
## mi_cv_esriMedium:log1p(old_rate) 0.17805021 0.00588966 30.2310
## mi_cv_esriLow:log1p(old_rate) 0.07629282 0.05097803 1.4966
## mi_cv_esriHigh:log1p(black_rate) 0.22727812 0.01145022 19.8492
## mi_cv_esriMedium:log1p(black_rate) 0.16015804 0.01387312 11.5445
## mi_cv_esriLow:log1p(black_rate) 0.12311592 0.10298571 1.1955
## mi_cv_esriHigh:log1p(hisp_rate) 0.31728054 0.01113463 28.4949
## mi_cv_esriMedium:log1p(hisp_rate) 0.19542720 0.01472698 13.2700
## mi_cv_esriLow:log1p(hisp_rate) 0.09478542 0.12665203 0.7484
## mi_cv_esriHigh:log1p(group_pop) 0.01317602 0.00063026 20.9056
## mi_cv_esriMedium:log1p(group_pop) 0.01054448 0.00100731 10.4680
## mi_cv_esriLow:log1p(group_pop) 0.02180365 0.00879205 2.4799
## mi_cv_esriHigh:log1p(dens) 0.01785684 0.00169428 10.5395
## mi_cv_esriMedium:log1p(dens) 0.00869646 0.00228621 3.8039
## mi_cv_esriLow:log1p(dens) -0.00350301 0.01820832 -0.1924
## Pr(>|z|)
## mi_cv_esriHigh < 2.2e-16
## mi_cv_esriMedium < 2.2e-16
## mi_cv_esriLow 0.0024573
## mi_cv_esriHigh:log1p(vacancy_rate) < 2.2e-16
## mi_cv_esriMedium:log1p(vacancy_rate) 1.989e-06
## mi_cv_esriLow:log1p(vacancy_rate) 0.7523010
## mi_cv_esriHigh:log1p(old_rate) < 2.2e-16
## mi_cv_esriMedium:log1p(old_rate) < 2.2e-16
## mi_cv_esriLow:log1p(old_rate) 0.1345020
## mi_cv_esriHigh:log1p(black_rate) < 2.2e-16
## mi_cv_esriMedium:log1p(black_rate) < 2.2e-16
## mi_cv_esriLow:log1p(black_rate) 0.2319050
## mi_cv_esriHigh:log1p(hisp_rate) < 2.2e-16
## mi_cv_esriMedium:log1p(hisp_rate) < 2.2e-16
## mi_cv_esriLow:log1p(hisp_rate) 0.4542235
## mi_cv_esriHigh:log1p(group_pop) < 2.2e-16
## mi_cv_esriMedium:log1p(group_pop) < 2.2e-16
## mi_cv_esriLow:log1p(group_pop) 0.0131409
## mi_cv_esriHigh:log1p(dens) < 2.2e-16
## mi_cv_esriMedium:log1p(dens) 0.0001425
## mi_cv_esriLow:log1p(dens) 0.8474405
##
## Lambda: 0.1595, LR test value: 707.11, p-value: < 2.22e-16
## Approximate (numerical Hessian) standard error: 0.0028702
## z-value: 55.573, p-value: < 2.22e-16
## Wald statistic: 3088.3, p-value: < 2.22e-16
##
## Log likelihood: -20613.76 for error model
## ML residual variance (sigma squared): 402.95, (sigma: 20.074)
## Number of observations: 71353
## Number of parameters estimated: 23
## AIC: 41274, (AIC for lm: 41979)
## Hausman test: 855.85, df: 21, p-value: < 2.22e-16
anova(SEM, SEMc)
## Model df AIC logLik Test L.Ratio p-value
## SEM 1 9 87304 -43643 1
## SEMc 2 23 41274 -20614 2 46058 0
and its Durbin variant:
SDEM <- spatialreg::errorsarlm(tr_form, data=df_tracts, listw=lw, method="Matrix", Durbin=TRUE, zero.policy=TRUE)
apply(SDEM$timings, 2, sum)
## user.self elapsed
## 8.973 8.997
summary(SDEM, Nagelkerke=TRUE)
##
## Call:spatialreg::errorsarlm(formula = tr_form, data = df_tracts, listw = lw,
## Durbin = TRUE, method = "Matrix", zero.policy = TRUE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.153522 -0.286920 0.010042 0.295821 2.932170
##
## Type: error
## Regions with no neighbours included:
## 12745 26628 29077 29120 32398 32775 43104 46344 46390 52303 58113 58261 70000 70055 70058 70457 70458
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.16017866 0.06471718 -95.1861 < 2.2e-16
## log1p(vacancy_rate) 0.22011019 0.03746441 5.8752 4.224e-09
## log1p(old_rate) 0.40323063 0.00432342 93.2666 < 2.2e-16
## log1p(black_rate) 0.50755974 0.02561676 19.8136 < 2.2e-16
## log1p(hisp_rate) 0.62902991 0.03218041 19.5470 < 2.2e-16
## log1p(group_pop) 0.01986287 0.00080835 24.5722 < 2.2e-16
## log1p(dens) 0.00833251 0.00399191 2.0873 0.03686
## lag.log1p(vacancy_rate) 0.33910367 0.05237609 6.4744 9.519e-11
## lag.log1p(old_rate) -0.00961209 0.00682877 -1.4076 0.15925
## lag.log1p(black_rate) 0.03009720 0.03046327 0.9880 0.32316
## lag.log1p(hisp_rate) -0.06474559 0.03688723 -1.7552 0.07922
## lag.log1p(group_pop) 0.01440312 0.00167140 8.6174 < 2.2e-16
## lag.log1p(dens) 0.03495989 0.00475808 7.3475 2.021e-13
##
## Lambda: 0.25272, LR test value: 2070, p-value: < 2.22e-16
## Approximate (numerical Hessian) standard error: 0.0042256
## z-value: 59.807, p-value: < 2.22e-16
## Wald statistic: 3576.8, p-value: < 2.22e-16
##
## Log likelihood: -43560.04 for error model
## ML residual variance (sigma squared): 0.19631, (sigma: 0.44307)
## Nagelkerke pseudo-R-squared: 0.25347
## Number of observations: 71353
## Number of parameters estimated: 15
## AIC: 87150, (AIC for lm: 89218)
summary(spatialreg::impacts(SDEM))
## Impact measures (SDEM, estimable, n):
## Direct Indirect Total
## log1p(vacancy_rate) 0.220110186 0.339103674 0.55921386
## log1p(old_rate) 0.403230628 -0.009612094 0.39361853
## log1p(black_rate) 0.507559744 0.030097197 0.53765694
## log1p(hisp_rate) 0.629029914 -0.064745592 0.56428432
## log1p(group_pop) 0.019862873 0.014403116 0.03426599
## log1p(dens) 0.008332506 0.034959895 0.04329240
## ========================================================
## Standard errors:
## Direct Indirect Total
## log1p(vacancy_rate) 0.0374644083 0.052376090 0.040633564
## log1p(old_rate) 0.0043234188 0.006828774 0.007484115
## log1p(black_rate) 0.0256167606 0.030463269 0.016154547
## log1p(hisp_rate) 0.0321804058 0.036887235 0.017557548
## log1p(group_pop) 0.0008083488 0.001671399 0.001803419
## log1p(dens) 0.0039919138 0.004758078 0.002749267
## ========================================================
## Z-values:
## Direct Indirect Total
## log1p(vacancy_rate) 5.875181 6.4743984 13.76236
## log1p(old_rate) 93.266613 -1.4075870 52.59387
## log1p(black_rate) 19.813580 0.9879832 33.28208
## log1p(hisp_rate) 19.546985 -1.7552303 32.13913
## log1p(group_pop) 24.572157 8.6174018 19.00057
## log1p(dens) 2.087346 7.3474828 15.74689
##
## p-values:
## Direct Indirect Total
## log1p(vacancy_rate) 4.2238e-09 9.5191e-11 < 2.22e-16
## log1p(old_rate) < 2.22e-16 0.15925 < 2.22e-16
## log1p(black_rate) < 2.22e-16 0.32316 < 2.22e-16
## log1p(hisp_rate) < 2.22e-16 0.07922 < 2.22e-16
## log1p(group_pop) < 2.22e-16 < 2.22e-16 < 2.22e-16
## log1p(dens) 0.036857 2.0206e-13 < 2.22e-16
and weighted:
SDEMw <- spatialreg::errorsarlm(tr_form, weights=tot_pop, data=df_tracts, listw=lw, method="Matrix", Durbin=TRUE, zero.policy=TRUE)
apply(SDEMw$timings, 2, sum)
## user.self elapsed
## 8.204 8.228
summary(SDEMw, Hausman=TRUE, Nagelkerke=TRUE)
##
## Call:spatialreg::errorsarlm(formula = tr_form, data = df_tracts, listw = lw,
## weights = tot_pop, Durbin = TRUE, method = "Matrix", zero.policy = TRUE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1665752 -0.2883039 0.0096818 0.2960586 2.9234058
##
## Type: error
## Regions with no neighbours included:
## 12745 26628 29077 29120 32398 32775 43104 46344 46390 52303 58113 58261 70000 70055 70058 70457 70458
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.26779399 0.06438165 -97.3537 < 2.2e-16
## log1p(vacancy_rate) 0.29917270 0.04034320 7.4157 1.210e-13
## log1p(old_rate) 0.40182015 0.00435840 92.1944 < 2.2e-16
## log1p(black_rate) 0.48896909 0.02617479 18.6809 < 2.2e-16
## log1p(hisp_rate) 0.61656157 0.03094209 19.9263 < 2.2e-16
## log1p(group_pop) 0.02144233 0.00076196 28.1408 < 2.2e-16
## log1p(dens) 0.01417781 0.00388774 3.6468 0.0002655
## lag.log1p(vacancy_rate) 0.40922588 0.05309595 7.7073 1.288e-14
## lag.log1p(old_rate) 0.00219764 0.00697219 0.3152 0.7526096
## lag.log1p(black_rate) 0.00270443 0.03096953 0.0873 0.9304128
## lag.log1p(hisp_rate) -0.06315358 0.03539952 -1.7840 0.0744198
## lag.log1p(group_pop) 0.01860324 0.00163288 11.3929 < 2.2e-16
## lag.log1p(dens) 0.02522840 0.00467896 5.3919 6.972e-08
##
## Lambda: 0.24893, LR test value: 2029.7, p-value: < 2.22e-16
## Approximate (numerical Hessian) standard error: 0.0056695
## z-value: 43.906, p-value: < 2.22e-16
## Wald statistic: 1927.8, p-value: < 2.22e-16
##
## Log likelihood: -44328.16 for error model
## ML residual variance (sigma squared): 778.22, (sigma: 27.897)
## Nagelkerke pseudo-R-squared: 0.23722
## Number of observations: 71353
## Number of parameters estimated: 15
## AIC: 88686, (AIC for lm: 90714)
## Hausman test: 67.692, df: 13, p-value: 2.1293e-09
summary(spatialreg::impacts(SDEMw))
## Impact measures (SDEM, estimable, n):
## Direct Indirect Total
## log1p(vacancy_rate) 0.29917270 0.409225875 0.70839857
## log1p(old_rate) 0.40182015 0.002197637 0.40401779
## log1p(black_rate) 0.48896909 0.002704431 0.49167353
## log1p(hisp_rate) 0.61656157 -0.063153582 0.55340799
## log1p(group_pop) 0.02144233 0.018603237 0.04004557
## log1p(dens) 0.01417781 0.025228403 0.03940622
## ========================================================
## Standard errors:
## Direct Indirect Total
## log1p(vacancy_rate) 0.040343204 0.053095951 0.042489080
## log1p(old_rate) 0.004358399 0.006972192 0.007509624
## log1p(black_rate) 0.026174793 0.030969533 0.016508152
## log1p(hisp_rate) 0.030942095 0.035399521 0.016387297
## log1p(group_pop) 0.000761965 0.001632879 0.001739686
## log1p(dens) 0.003887738 0.004678957 0.002686349
## ========================================================
## Z-values:
## Direct Indirect Total
## log1p(vacancy_rate) 7.415690 7.70728964 16.67249
## log1p(old_rate) 92.194431 0.31520029 53.80000
## log1p(black_rate) 18.680915 0.08732554 29.78368
## log1p(hisp_rate) 19.926303 -1.78402361 33.77055
## log1p(group_pop) 28.140837 11.39290899 23.01884
## log1p(dens) 3.646803 5.39188581 14.66906
##
## p-values:
## Direct Indirect Total
## log1p(vacancy_rate) 1.2101e-13 1.2879e-14 < 2.22e-16
## log1p(old_rate) < 2.22e-16 0.75261 < 2.22e-16
## log1p(black_rate) < 2.22e-16 0.93041 < 2.22e-16
## log1p(hisp_rate) < 2.22e-16 0.07442 < 2.22e-16
## log1p(group_pop) < 2.22e-16 < 2.22e-16 < 2.22e-16
## log1p(dens) 0.00026552 6.9722e-08 < 2.22e-16
Finally, using the development version of sphet:
library(sphet)
##
## Attaching package: 'sphet'
## The following object is masked from 'package:spatialreg':
##
## impacts
system.time(SEM_GMM <- spreg(tr_form, data=df_tracts, listw=as(lw, "CsparseMatrix"), model="error", het=FALSE))
## user system elapsed
## 1.902 0.058 1.966
summary(SEM_GMM)
##
## Generalized stsls
##
## Call:
## spreg(formula = tr_form, data = df_tracts, listw = as(lw, "CsparseMatrix"),
## model = "error", het = FALSE)
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.15222 -0.29469 0.01032 -0.00021 0.30086 3.00213
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -6.24575367 0.03720516 -167.873 < 2.2e-16 ***
## log1p(vacancy_rate) 0.35157200 0.02843557 12.364 < 2.2e-16 ***
## log1p(old_rate) 0.41042586 0.00419882 97.748 < 2.2e-16 ***
## log1p(black_rate) 0.54682738 0.01352155 40.441 < 2.2e-16 ***
## log1p(hisp_rate) 0.59271015 0.01514450 39.137 < 2.2e-16 ***
## log1p(group_pop) 0.02040826 0.00080207 25.444 < 2.2e-16 ***
## log1p(dens) 0.03112745 0.00222134 14.013 < 2.2e-16 ***
## rho 0.26259335 0.00477093 55.040 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
with its Durbin variant:
system.time(SDEM_GMM <- spreg(tr_form, data=df_tracts, listw=as(lw, "CsparseMatrix"), model="error", het=FALSE, Durbin=TRUE))
## user system elapsed
## 2.133 0.078 2.216
summary(SDEM_GMM)
##
## Generalized stsls
##
## Call:
## spreg(formula = tr_form, data = df_tracts, listw = as(lw, "CsparseMatrix"),
## model = "error", het = FALSE, Durbin = TRUE)
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.14258 -0.29470 0.00949 -0.00075 0.29963 2.94460
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -6.13268960 0.06558835 -93.5027 < 2.2e-16 ***
## log1p(vacancy_rate) 0.22111422 0.03740353 5.9116 3.388e-09 ***
## log1p(old_rate) 0.40241917 0.00432891 92.9608 < 2.2e-16 ***
## log1p(black_rate) 0.50792821 0.02557387 19.8612 < 2.2e-16 ***
## log1p(hisp_rate) 0.62901587 0.03212850 19.5781 < 2.2e-16 ***
## log1p(group_pop) 0.01983835 0.00080826 24.5445 < 2.2e-16 ***
## log1p(dens) 0.00826926 0.00398581 2.0747 0.03802 *
## lag_log1p(vacancy_rate) 0.34382296 0.05251777 6.5468 5.879e-11 ***
## lag_log1p(old_rate) -0.01195826 0.00688147 -1.7377 0.08226 .
## lag_log1p(black_rate) 0.03012242 0.03048615 0.9881 0.32312
## lag_log1p(hisp_rate) -0.06631076 0.03690239 -1.7969 0.07235 .
## lag_log1p(group_pop) 0.01435628 0.00167892 8.5509 < 2.2e-16 ***
## lag_log1p(dens) 0.03521147 0.00476425 7.3908 1.460e-13 ***
## rho 0.26161661 0.00476485 54.9055 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This also lets us adjust the coefficient standard errors for heterskedasticity:
system.time(SEM_GMMh <- spreg(tr_form, data=df_tracts, listw=as(lw, "CsparseMatrix"), model="error", het=TRUE))
## user system elapsed
## 2.660 0.036 2.703
summary(SEM_GMMh)
##
## Generalized stsls
##
## Call:
## spreg(formula = tr_form, data = df_tracts, listw = as(lw, "CsparseMatrix"),
## model = "error", het = TRUE)
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.15225 -0.29467 0.01030 -0.00021 0.30090 3.00219
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -6.24579581 0.03996340 -156.288 < 2.2e-16 ***
## log1p(vacancy_rate) 0.35123442 0.03135236 11.203 < 2.2e-16 ***
## log1p(old_rate) 0.41044090 0.00456026 90.004 < 2.2e-16 ***
## log1p(black_rate) 0.54681845 0.01429321 38.257 < 2.2e-16 ***
## log1p(hisp_rate) 0.59276338 0.01559615 38.007 < 2.2e-16 ***
## log1p(group_pop) 0.02040097 0.00083032 24.570 < 2.2e-16 ***
## log1p(dens) 0.03109688 0.00232960 13.349 < 2.2e-16 ***
## rho 0.27593286 0.00488909 56.438 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
system.time(SDEM_GMMh <- spreg(tr_form, data=df_tracts, listw=as(lw, "CsparseMatrix"), model="error", het=TRUE, Durbin=TRUE))
## user system elapsed
## 3.207 0.095 3.311
summary(SDEM_GMMh)
##
## Generalized stsls
##
## Call:
## spreg(formula = tr_form, data = df_tracts, listw = as(lw, "CsparseMatrix"),
## model = "error", het = TRUE, Durbin = TRUE)
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.14259 -0.29469 0.00947 -0.00076 0.29964 2.94460
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -6.13295558 0.06908374 -88.7757 < 2.2e-16 ***
## log1p(vacancy_rate) 0.22118334 0.04042524 5.4714 4.465e-08 ***
## log1p(old_rate) 0.40242167 0.00468697 85.8597 < 2.2e-16 ***
## log1p(black_rate) 0.50798180 0.02743648 18.5148 < 2.2e-16 ***
## log1p(hisp_rate) 0.62911182 0.03381343 18.6054 < 2.2e-16 ***
## log1p(group_pop) 0.01983960 0.00083555 23.7443 < 2.2e-16 ***
## log1p(dens) 0.00825998 0.00410720 2.0111 0.04432 *
## lag_log1p(vacancy_rate) 0.34365182 0.05502973 6.2448 4.242e-10 ***
## lag_log1p(old_rate) -0.01192960 0.00729841 -1.6345 0.10214
## lag_log1p(black_rate) 0.03000758 0.03268209 0.9182 0.35853
## lag_log1p(hisp_rate) -0.06644231 0.03895247 -1.7057 0.08806 .
## lag_log1p(group_pop) 0.01436527 0.00171910 8.3563 < 2.2e-16 ***
## lag_log1p(dens) 0.03521675 0.00491840 7.1602 8.056e-13 ***
## rho 0.27495975 0.00488766 56.2560 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
sessionInfo()
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Fedora 34 (Workstation Edition)
##
## Matrix products: default
## BLAS: /home/rsb/topics/R/R410-share/lib64/R/lib/libRblas.so
## LAPACK: /home/rsb/topics/R/R410-share/lib64/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
## [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
## [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] sphet_1.11 coda_0.19-4 spatialreg_1.1-8 Matrix_1.3-3
## [5] spData_0.3.8 sf_0.9-9
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 mvtnorm_1.1-1 lattice_0.20-44 tidyr_1.1.3
## [5] deldir_0.2-10 class_7.3-19 zoo_1.8-9 gtools_3.8.2
## [9] assertthat_0.2.1 digest_0.6.27 lmtest_0.9-38 utf8_1.2.1
## [13] RSpectra_0.16-0 spDataLarge_0.5.1 R6_2.5.0 backports_1.2.1
## [17] evaluate_0.14 e1071_1.7-7 highr_0.9 pillar_1.6.1
## [21] rlang_0.4.11 spdep_1.1-8 rstudioapi_0.13 gdata_2.18.0
## [25] raster_3.4-10 jquerylib_0.1.4 gmodels_2.18.1 rmarkdown_2.8
## [29] splines_4.1.0 stringr_1.4.0 proxy_0.4-25 broom_0.7.6
## [33] compiler_4.1.0 xfun_0.23 pkgconfig_2.0.3 htmltools_0.5.1.1
## [37] tidyselect_1.1.1 tibble_3.1.2 expm_0.999-6 codetools_0.2-18
## [41] matrixStats_0.58.0 fansi_0.4.2 crayon_1.4.1 dplyr_1.0.6
## [45] MASS_7.3-54 grid_4.1.0 nlme_3.1-152 jsonlite_1.7.2
## [49] lifecycle_1.0.0 DBI_1.1.1 magrittr_2.0.1 units_0.7-1
## [53] KernSmooth_2.23-20 cli_2.5.0 stringi_1.6.2 LearnBayes_2.15.1
## [57] sp_1.4-5 bslib_0.2.5.1 ellipsis_0.3.2 generics_0.1.0
## [61] vctrs_0.3.8 boot_1.3-28 tools_4.1.0 glue_1.4.2
## [65] purrr_0.3.4 yaml_2.2.1 classInt_0.4-3 knitr_1.33
## [69] sass_0.4.0