Crossvalidation of bandwidth for geographically weighted regression

The function finds a bandwidth for a given geographically weighted regression by optimzing a selected function. For cross-validation, this scores the root mean square prediction error for the geographically weighted regressions, choosing the bandwidth minimizing this quantity.

gwr.sel(formula, data=list(), coords, adapt=FALSE, gweight=gwr.Gauss,
 method = "cv", verbose = TRUE, longlat=NULL, RMSE=FALSE, weights,
 tol=.Machine$double.eps^0.25, show.error.messages = FALSE)

Arguments

formula: regression model formula as in lm
data: model data frame as in lm, or may be a SpatialPointsDataFrame or SpatialPolygonsDataFrame object as defined in package sp
coords: matrix of coordinates of points representing the spatial positions of the observations
adapt: either TRUE: find the proportion between 0 and 1 of observations to include in weighting scheme (k-nearest neighbours), or FALSE --- find global bandwidth
gweight: geographical weighting function, at present gwr.Gauss() default, or gwr.gauss(), the previous default or gwr.bisquare()
method: default "cv" for drop-1 cross-validation, or "aic" for AIC optimisation (depends on assumptions about AIC degrees of freedom)
verbose: if TRUE (default), reports the progress of search for bandwidth
longlat: TRUE if point coordinates are longitude-latitude decimal degrees, in which case distances are measured in kilometers; if x is a SpatialPoints object, the value is taken from the object itself
RMSE: default FALSE to correspond with CV scores in newer references (sum of squared CV errors), if TRUE the previous behaviour of scoring by LOO CV RMSE
weights: case weights used as in weighted least squares, beware of scaling issues --- only used with the cross-validation method, probably unsafe
tol: the desired accuracy to be passed to optimize
show.error.messages: default FALSE; may be set to TRUE to see error messages if gwr.sel returns without a value

Details

If the regression contains little pattern, the bandwidth will converge to the upper bound of the line search, which is the diagonal of the bounding box of the data point coordinates for “adapt=FALSE”, and 1 for “adapt=TRUE”; see the simulation block in the examples below.

Value

returns the cross-validation bandwidth.

Note

Use of method="aic" results in the creation of an n by n matrix, and should not be chosen when n is large.

References

Fotheringham, A.S., Brunsdon, C., and Charlton, M.E., 2002, Geographically Weighted Regression, Chichester: Wiley; Paez A, Farber S, Wheeler D, 2011, "A simulation-based study of geographically weighted regression as a method for investigating spatially varying relationships", Environment and Planning A 43(12) 2992-3010; http://gwr.nuim.ie/

Author

Roger Bivand Roger.Bivand@nhh.no

Examples

data(columbus, package="spData")
gwr.sel(CRIME ~ INC + HOVAL, data=columbus,
  coords=cbind(columbus$X, columbus$Y))
#> Bandwidth: 12.65221 CV score: 7432.209 
#> Bandwidth: 20.45127 CV score: 7462.704 
#> Bandwidth: 7.83213 CV score: 7323.545 
#> Bandwidth: 4.853154 CV score: 7307.57 
#> Bandwidth: 5.125504 CV score: 7322.796 
#> Bandwidth: 3.012046 CV score: 6461.764 
#> Bandwidth: 1.874179 CV score: 6473.378 
#> Bandwidth: 2.475485 CV score: 6109.995 
#> Bandwidth: 2.447721 CV score: 6098.372 
#> Bandwidth: 2.228647 CV score: 6064.1 
#> Bandwidth: 2.264538 CV score: 6060.774 
#> Bandwidth: 2.280666 CV score: 6060.649 
#> Bandwidth: 2.274969 CV score: 6060.601 
#> Bandwidth: 2.2751 CV score: 6060.601 
#> Bandwidth: 2.27506 CV score: 6060.601 
#> Bandwidth: 2.275019 CV score: 6060.601 
#> Bandwidth: 2.27506 CV score: 6060.601 
#> [1] 2.27506
gwr.sel(CRIME ~ INC + HOVAL, data=columbus,
  coords=cbind(columbus$X, columbus$Y), gweight=gwr.bisquare)
#> Bandwidth: 12.65221 CV score: 8180.619 
#> Bandwidth: 20.45127 CV score: 7552.85 
#> Bandwidth: 25.27136 CV score: 7508.227 
#> Bandwidth: 23.68132 CV score: 7519.864 
#> Bandwidth: 28.25033 CV score: 7491.85 
#> Bandwidth: 30.09144 CV score: 7486.673 
#> Bandwidth: 31.69353 CV score: 7483.663 
#> Bandwidth: 31.08159 CV score: 7484.706 
#> Bandwidth: 32.21945 CV score: 7482.846 
#> Bandwidth: 32.54449 CV score: 7482.371 
#> Bandwidth: 32.74538 CV score: 7482.088 
#> Bandwidth: 32.86953 CV score: 7481.916 
#> Bandwidth: 32.94626 CV score: 7481.812 
#> Bandwidth: 32.99368 CV score: 7481.748 
#> Bandwidth: 33.02299 CV score: 7481.708 
#> Bandwidth: 33.04111 CV score: 7481.684 
#> Bandwidth: 33.0523 CV score: 7481.669 
#> Bandwidth: 33.05922 CV score: 7481.659 
#> Bandwidth: 33.0635 CV score: 7481.654 
#> Bandwidth: 33.06614 CV score: 7481.65 
#> Bandwidth: 33.06777 CV score: 7481.648 
#> Bandwidth: 33.06878 CV score: 7481.647 
#> Bandwidth: 33.06941 CV score: 7481.646 
#> Bandwidth: 33.06979 CV score: 7481.645 
#> Bandwidth: 33.07003 CV score: 7481.645 
#> Bandwidth: 33.07018 CV score: 7481.645 
#> Bandwidth: 33.07027 CV score: 7481.645 
#> Bandwidth: 33.07032 CV score: 7481.645 
#> Bandwidth: 33.07037 CV score: 7481.645 
#> Bandwidth: 33.07037 CV score: 7481.645 
#> Warning: Bandwidth converged to upper bound:33.0704149683672
#> [1] 33.07037
if (FALSE) {
data(georgia)
set.seed(1)
X0 <- runif(nrow(gSRDF)*3)
X1 <- matrix(sample(X0), ncol=3)
X1 <- prcomp(X1, center=FALSE, scale.=FALSE)$x
gSRDF$X1 <- X1[,1]
gSRDF$X2 <- X1[,2]
gSRDF$X3 <- X1[,3]
yrn <- rnorm(nrow(gSRDF))
gSRDF$yrn <- sample(yrn)
bw <- gwr.sel(yrn ~ X1 + X2 + X3, data=gSRDF, method="cv", adapt=FALSE, verbose=FALSE)
bw
bw <- gwr.sel(yrn ~ X1 + X2 + X3, data=gSRDF, method="aic", adapt=FALSE, verbose=FALSE)
bw
bw <- gwr.sel(yrn ~ X1 + X2 + X3, data=gSRDF, method="cv", adapt=TRUE, verbose=FALSE)
bw
bw <- gwr.sel(yrn ~ X1 + X2 + X3, data=gSRDF, method="aic", adapt=TRUE, verbose=FALSE)
bw
}