All the material presented here, to the extent it is original, is available under CC-BY-SA. Parts build on joint tutorials with Edzer Pebesma.
I am running R 3.6.1, with recent update.packages()
.
needed <- c("raster", "stars", "abind", "elevatr", "sp", "mapview", "sf", "osmdata")
Script and data at https://github.com/rsbivand/ECS530_h19/raw/master/ECS530_I.zip. Download to suitable location, unzip and use as basis.
2/12 (I) Spatial data representation, (II) Support+topology, input/output
3/12 (III) Coordinate reference systems, (IV) Visualization
4/12 (V) R/GIS interfaces, project surgery
5/12 (VI) Spatial autocorrelation, (VII) Spatial regression
6/12 (VIII) Interpolation, point processes, project surgery
7/12 Presentations
Introduction and background: why break stuff (why not)?
Data input/output and representation: movement from legacy sp to standards-compliant sf representation; coordinate reference systems; developments similar to GeoPandas
and Shapely
in Python
Opportunities in visualization (tmap, mapview), but also challenges when upstream software libraries evolve (PROJ/GDAL)
Spatial weights and measures of autocorrelation: software packages previously using the sp representation may add sf representation or replace sp with sf; spdep can use both for constructing spatial weights
Spatial regression: model estimation and handling split out from spdep to spatialreg; fewer reverse dependencies, quicker changes
09:15-09:45 Course startup, practical matters, background
09:45-10:30 Vector representation: sf replaces sp, rgeos and rgdal: vector
10:30-11:00 Raster representation: stars/sf replace sp and rgdal: raster remains for now
In the early and mid 1990s, those of us who were teaching courses in spatial analysis beyond the direct application of geographical information systems (GIS) found the paucity of software limiting.
In institutions with funding for site licenses for GIS, it was possible to write or share scripts for Arc/Info (in AML), ArcView (in Avenue), or later in Visual Basic for ArcGIS.
If site licenses and associated dongles used in the field were a problem (including students involved in fieldwork in research projects), there were few alternatives, but opportunities were discussed on mailing lists.
From late 1996, the R programmimg language and environment began to be seen as an alternative for teaching and research involving spatial analysis.
R uses much of the syntax of S, then available commercially as S-Plus, but was and remains free to install, use and extend under the GNU General Public License (GPL).
In addition, it could be installed portably across multiple operating systems, including Windows and Apple MACOS.
At about the same time, the S-Plus SpatialStats module was published (Kaluzny et al. 1998), and a meeting occurred in Leicester to which many of those looking for solutions took part.
Much of the porting of S code to R for spatial statistics was begun by Albrecht Gebhardt as soon as the R package mechanism matured. Since teachers moving courses from S to R needed access to the S libraries previously used, porting was a crucial step.
CRAN listings show tripack (Renka and Gebhardt 2016) and akima (Akima and Gebhardt 2016) - both with non-open source licenses - available from August 1998 ported by Albrecht Gebhardt; ash and sgeostat (Majure and Gebhardt 2016) followed in April 1999.
The spatial package was available as part of MASS (Venables and Ripley 2002), also ported in part by Albrecht Gebhardt.
In the earliest period, CRAN administrators helped practically with porting and publication.
Albrecht and I presented an overview of possibilities of usin R for research and teaching in spatial analysis and statistics in August 1998 (Roger Bivand and Gebhardt 2000).
The S-PLUS version of splancs provided point pattern analysis (Rowlingson and Diggle 1993, 2017).
I had contacted Barry Rowlingson in 1997 but only moved forward with porting as R’s ability to load shared objects advanced.
In September 1998, I wrote to him: “It wasn’t at all difficult to get things running, which I think is a result of your coding, thank you!”
However, I added this speculation: “An issue I have thought about a little is whether at some stage Albrecht and I wouldn’t integrate or harmonize the points and pairs objects in splancs, spatial and sgeostat - they aren’t the same, but for users maybe they ought to appear to be so”.
This concern with class representations for geographical data turned out to be fruitful.
A further step was to link GRASS and R (Roger Bivand 2000), and followed up at several meetings and working closely with Markus Neteler.
The interface has evolved, and its almost current status is presented by (Lovelace, Nowosad, and Muenchow 2019), we return to the current status below.
A consequence of this work was that the CRAN team suggested that I attend a meeting in Vienna in early 2001 to talk about the GRASS GIS interface.
The meeting gave unique insights into the dynamics of R development, and very valuable contacts.
Later the same year Luc Anselin and Serge Rey asked me to take part in a workshop in Santa Barbara, which again led to many fruitful new contacts (Bivand 2006).
Further progress was made in spatial econometrics (Bivand 2002).
During the second half of 2002, it seemed relevant to propose a spatial statistics paper session at the next Vienna meeting to be held in March 2003, together with a workshop to discuss classes for spatial data.
I had reached out to Edzer Pebesma as an author of the stand-alone open source program gstat
(Pebesma and Wesseling 1998); it turned out that he had just been approached to wrap the program for S-Plus.
He saw the potential of the workshop immediately, and in November 2002 wrote in an email: “I wonder whether I should start writing S classes. I’m afraid I should.”
Virgilio Gómez-Rubio had been developing two spatial packages, RArcInfo (V. Gómez-Rubio and López-Quílez 2005; Gómez-Rubio 2011) and DCluster (V. Gómez-Rubio, Ferrándiz-Ferragud, and Lopez-Quílez 2005; Gómez-Rubio, Ferrándiz-Ferragud, and López-Quílez 2015), and was committed to participating.
Although he could not get to the workshop, Nicholas Lewin-Koh wrote in March 2003 that: “I was looking over all the DSC material, especially the spatial stuff. I did notice, after looking through peoples’ packages that there is a lot of duplication of effort. My suggestion is that we set up a repository for spatial packages similar to the Bioconductor mode, where we have a base spatial package that has S-4 based methods and classes that are efficient and general.”
Straight after the workshop, a collaborative repository for the development of software using SourceForge was established, and the R-sig-geo mailing list (still with over 3,500 subscribers) was created to facilitate interaction.
So the mandate for the development of the sp package emerged in discussions between interested contributors before, during, and especially following the 2003 Vienna workshop.
Coding meetings were organized by Barry Rowlingson in Lancaster in November 2004 and by Virgilio Gómez-Rubio in Valencia in May 2005, at both of which the class definitions and implementations were stress-tested and often changed radically; the package was first published on CRAN in April 2005.
The underlying model adopted was for S4 (new-style) classes to be used, for "Spatial"
objects, whether raster or vector, to behave like "data.frame"
objects, and for visualization methods to make it easy to show the objects.
From an early point in time, object conversion (known as coercion in S and R) to and from sp classes and classes in for example the spatstat package (Baddeley and Turner 2005; Baddeley, Rubak, and Turner 2015; Baddeley, Turner, and Rubak 2019).
Packages could choose whether they would use sp classes and methods directly, or rather use those classes for functionality that they did not provide themselves through coercion.
Reading and writing ESRI Shapefiles had been possible using the maptools package (Bivand and Lewin-Koh 2019) available from CRAN since August 2003, but rgdal, on CRAN from November 2003, initially only supported raster data read and written using the external GDAL library (Warmerdam 2008).
Further code contributions by Barry Rowlingson for handling projections using the external PROJ.4 library and the vector drivers in the then OGR part of GDAL were folded into rgdal, permitting reading into sp-objects and writing from sp-objects of vector and raster data.
For vector data it became possible to project coordinates, and in addition to transform them where datum specifications were available.
Until recently, the interfaces to external libraries GDAL and PROJ have been relatively stable, and upstream changes have not led to breaking changes for users of packages using sp classes or rgdal functionalities, although they have involved significant maintenance effort.
The final part of the framework for spatial vector data handling was the addition of the rgeos package interfacing the external GEOS library in 2011, thanks to Colin Rundell’s 2010 Google Summer of Coding project.
The rgeos package provided vector topological predicates and operations typically found in GIS such as intersection; note that by this time, both GDAL and GEOS used the Simple Features vector representation internally.
By the publication of ASDAR (Bivand, Pebesma, and Gomez-Rubio 2008), a few packages not written or maintained by the book authors and their nearest collaborators had begun to use sp classes. By the publication of the second edition (Bivand, Pebesma, and Gomez-Rubio 2013), we had seen that the number of packages depending on sp, importing from and suggesting it (in CRAN terminology for levels of dependency) had grown strongly. In late 2014, (de Vries 2014) looked at CRAN package clusters from a page rank graph, and found a clear spatial cluster that we had not expected. This cluster is from late November 2019:
Spatial data typically combine position data in 2D (or 3D), attribute data and metadata related to the position data. Much spatial data could be called map data or GIS data. We collect and handle much more position data since global navigation satellite systems (GNSS) like GPS came on stream 20 years ago, earth observation satellites have been providing data for longer.
suppressPackageStartupMessages(library(osmdata))
library(sf)
## Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
bbox <- opq(bbox = 'bergen norway')
byb0 <- osmdata_sf(add_osm_feature(bbox, key = 'railway',
value = 'light_rail'))$osm_lines
tram <- osmdata_sf(add_osm_feature(bbox, key = 'railway',
value = 'tram'))$osm_lines
byb1 <- tram[!is.na(tram$name),]
o <- intersect(names(byb0), names(byb1))
byb <- rbind(byb0[,o], byb1[,o])
saveRDS(byb, file="byb.rds")
Spatial vector data is based on points, from which other geometries are constructed. Vector data is often also termed object-based spatial data. The light rail tracks are 2D vector data. The points themselves are stored as double precision floating point numbers, typically without recorded measures of accuracy (GNSS provides a measure of accuracy). Here, lines are constructed from points.
library(mapview)
mapview(byb)
We can download monthly CSV files of use, and manipulate the input to let us use the stplanr package to aggregate origin-destination data. One destination is in Oslo, some are round trips, but otherwise things are OK. We can use to route the volumes onto cycle paths, via an API and API key. We’d still need to aggregate the bike traffic by cycle path segment for completeness.
bike_fls <- list.files("bbs")
trips0 <- NULL
for (fl in bike_fls) trips0 <- rbind(trips0,
read.csv(file.path("bbs", fl), header=TRUE))
trips0 <- trips0[trips0[, 8] < 6 & trips0[, 13] < 6,]
trips <- cbind(trips0[,c(1, 4, 2, 9)], data.frame(count=1))
from <- unique(trips0[,c(4,5,7,8)])
names(from) <- substring(names(from), 7)
to <- unique(trips0[,c(9,10,12,13)])
names(to) <- substring(names(to), 5)
stations0 <- st_as_sf(merge(from, to, all=TRUE),
coords=c("station_longitude", "station_latitude"))
stations <- aggregate(stations0, list(stations0$station_id),
head, n=1)
suppressWarnings(stations <- st_cast(stations, "POINT"))
st_crs(stations) <- 4326
od <- aggregate(trips[,-(1:4)], list(trips$start_station_id,
trips$end_station_id), sum)
od <- od[-(which(od[,1] == od[,2])),]
library(stplanr)
od_lines <- od2line(flow=od, zones=stations, zone_code="Group.1",
origin_code="Group.1", dest_code="Group.2")
saveRDS(od_lines, "od_lines.rds")
Sys.setenv(CYCLESTREET="XxXxXxXxXxXxXxXx")
od_routes <- line2route(od_lines, "route_cyclestreet",
plan = "fastest")
saveRDS(od_routes, "od_routes.rds")
Origin-destination lines
Routed lines along cycle routes