Required current contributed CRAN packages:

I am running R 3.6.1, with recent update.packages().

needed <- c("sf", "mapview", "sp", "rgdal", "wordcloud", "RColorBrewer", "rgeos", "magrittr", "igraph", "miniCRAN", "ctv", "RSQLite", "lwgeom")

Script

Script at https://github.com/rsbivand/geostat19_talk/raw/master/geostat_bivand_19.zip. Download to suitable location and use as basis.

Introduction

  • Understanding the original uses of S and R (in the 1980s and 1990s), and seeing how these uses affected the development of R lets us appreciate the robustness of R’s ecosystem.

  • This talk uses readings of the R sources and other information to explore R’s history. The topics to be touched on include the “colour” books (brown, blue, white, green), interlinkages to SICP (Scheme) and LispStat

  • This should show where and when impulses flowed between R and the development of capabilities in R for spatial data analysis

  • The focus will be on the period 1997-2004, which is when a lot of the foundations for R spatial came into being

History of R and its data structures

Sources

  • Rasmus Bååth has a useful blog piece on R’s antecedents in the S language

  • Something similar is present in the second chapter of Chambers (2016), from the viewpoint of one of those responsible for the development of the S language

  • In addition to S, we need to take SICP and Scheme into account (Abelson and Sussman 1996), as described by Ihaka and Gentleman (1996) and Wickham (2014)

  • Finally, LispStat and its creators have played and continue to play a major role in developing R (Tierney 1990, 1996, 2005)

Early R was Scheme via SICP

Ross Ihaka’s description

Ross Ihaka’s description

(JSM talk)

From S to R: Brown Books

Becker and Chambers (1984): S: An Interactive Environment for Data Analysis and Graphics, A.K.A. the Brown Book

Becker and Chambers (1985): Extending the S System

From S to R: Blue and White Books

Becker, Chambers and Wilks (1988): The New S Language: A Programming Environment for Data Analysis and Graphics, A.K.A. the Blue Book.

Chambers and Hastie (1992): Statistical Models in S, A.K.A. the White Book.

Blue and White Books

Blue and White Books

From S to R: Green Book

Chambers (1998): Programming with Data: A Guide to the S Language, A.K.A. the Green Book.

Venables and Ripley (2000): S Programming

S2 to S3 to S4

  • The S2 system was described in the Brown Book, S3 in the Blue Book and completed in the White Book, finally S4 in the Green Book

  • The big advance from S2 to S3 was that users could write functions; that data.frame objects were defined; that formula objects were defined; and that S3 classes and method dispatch appeared

  • S4 brought connections and formal S4 classes, the latter seen in R in the methods package (still controversial)

  • S-PLUS was/is the commercial implementation of S and its releases drove S3 and S4 changes

S, Bell Labs, S-PLUS

  • S was a Bell Labs innovation, like Unix, C, C++, and many interpreted languages (like AWK); many of these share key understandings

  • Now owned by Nokia, previously Alcatel-Lucent, Lucent, and AT&T

  • Why would a telecoms major (AT&T) pay for fundamental research in computer science and data analysis (not to sell or market other products better)?

  • Some Green Book examples are for quality control of telecoms components

S-PLUS and R

  • S-PLUS was quickly adopted for teaching and research, and with S3, provided extensibility in the form of libraries

  • Most links have died by now, but see this FAQ} for a flavour - there was a lively community of applied statisticians during the 1990s

  • S built on a long tradition of documentation through examples, with use cases and data sets taken from the applied statistical literature; this let users compare output with methods descriptions

  • … so we get to R

and what about LispStat?

  • Luke Tierney was in R core in 1997, and has continued to exert clear influence over development

  • Because R uses a Scheme engine, similar to Lisp, under the hood, his insight into issues like the garbage collector, namespaces, byte-compilation, serialization, parallelization, and now ALTREP has been crucial (see also the proposal by Luke Tierney, Gabe Becker and Tomas Kalibera})

  • Many of these issues involve the defensive copy on possible change policy involved in lazy evaluation, which may lead to multiple redundant copies of data being present in memory

  • Luke Tierney and Brian Ripley have fought hard to let R load fast, something that is crucial to ease the use of R on multicore systems or inside databases

R spatial and R-sig-geo

Antecedents (among others)

  • The AI-GEOSTATS listserve/mailing list was started by Gregoire Dubois in 1995; AI was from Arc/Info

  • GRASS accepted code contributions, then kept in src.contrib and src.garden and GRASS mailing lists were active (the archives seem to have been lost in moving from minordomo to mailman in 2001)

  • Bao et al. (2000) describe S-PLUS links to GRASS and ArcView; the module for spatial statistics was also from the mid-1990’s

  • MASS included S and compiled code for kriging, trend surfaces and point pattern analysis, splancs provided point pattern analysis

Albrecht Gebhardt

  • Much of the porting of S code to R for spatial statistics was begun by Albrecht Gebhardt

  • The CRAN archives show tripack and akima - both with ACM licenses - from August 1998; ash and sgeostat followed in April 1999

  • The date for the MASS spatial package is unknown because it was bundled in the VR metapackage subsequently split into spatial, MASS, nnet and class

  • In the earliest period, CRAN admins helped practically with porting

Barry Rowlingson

  • Rowlingson and Diggle (1993) describes the S-PLUS version of splancs for point pattern analysis

  • I’d contacted Barry in 1997 but only moved forward with porting splancs as R’s ability to load shared objects advanced

  • In September 1998, I wrote to him:

  • The earliest splancs in the CRAN archive is from November 2000

1998 ERSA talk

  • Albrecht and I were teaching spatial data analysis using R and code that would go into CRAN packages (I still have the package I wrote, and the exams)

  • In 1998, the ERSA (European Regional Science Association) meeting was in Vienna, and we gave a talk subsequently published in Bivand and Gebhardt (2000)

  • We covered R, point pattern analysis, geostatistics, ESDA and spatial econometrics

GEOCOMP 2000

  • Markus Neteler and I were in touch too, as I’d been trying to interface R and GRASS (R. Bivand 2000)

  • The first GRASS package in the CRAN archive is from November 2000

  • Before that, we’d presented at GeoComputation 2000 in Chatham Bivand and Neteler (2000)

Nicholas Lewin-Koh

  • Nicholas Lewin-Koh posted on the R list (subsequently split between R-help and R-devel) in September 2000, asking about coding the reading of ESRI shapefiles

  • He made a number of key contributions which ended up in spweights - currently part of spdep - and what became maptools

  • He wrote the first vector representation based on the shapefile import format, and I should be blamed for subsequent fumbling implementations in maptools

DSC 2001

  • From interaction with Kurt Hornik, Ross Ihaka and Brian Ripley on the R list during late 2000, contacts widened, including Tim Keitt (rgdal)

  • I ended up presenting at the DSC meeting in Vienna in March 2001, getting to meet many R core members, developers and CRAN admins

Luc Anselin

  • Further contacts came from a CSISS workshop organised by Luc Anselin in May 2002 in Santa Barbara, including Konstantin Krivoruchko (ESRI) and Gilberto Câmara (INPE)

  • My talk eventually appeared as Bivand (2006), but the contacts were more useful

  • Visiting the INPE earth observation lab and presenting at a conference in December 2002 provided further ideas and links to TerraLib

  • Luc himself contributed code to spdep and continues be very supportive

  • I did a further piece for the ERSA 2002 conference in Dortmund (Bivand 2002)

Getting to R spatial

  • The DSC meeting in March 2003 was also being planned for Vienna, and I’d talked about it with Kurt Hornik when he gave a keynote at the GRASS meeting in Trento in September 2002

  • I met Virgilio Gómez-Rubio at the GRASS meeting among many others and talked about DCluster and RArcInfo

Planning for R spatial

  • In mid November I started emailing contacts to invite sumbissions for a spatial statistics session at DSC, and a workshop at which we could discuss working together

  • Of the replies, far more would have liked to attend than were able to (Pebesma 2004; Gómez-Rubio, Ferrándiz-Ferragud, and López-Quílez 2005; Gómez-Rubio and López-Quílez 2005)

Workshop

Edzer Pebesma

  • Edzer was already well known because of his open-source gstat program (Pebesma and Wesseling 1998)

  • From Edzer’s reply 13 November 2002

Edzer Pebesma

Correspondence developed quickly (21 November)

Classes and Nicholas Lewin-Koh

Contribution even though he couldn’t get to Vienna:

E-mail traffic: PROJ4

R-sig-geo

StatGIS meeting 2003

  • Albrecht Gebhardt was a local organiser for a meeting at which many of those at the DSC workshop met in October 2003

useR! 2004

  • R News class description: (Pebesma and Bivand 2005)

CRAN arrivals

  • fields and geoR in mid-2001, spatstat in early 2002

  • gstat in February 2003, maptools in August 2003, rgdal in November 2003

  • sp in April 2005 (after a workshop in Lancaster November 2004 and before one in Valencia May 2005)

  • classInt in March 2006, spgrass6 in August 2006

useR! 2006

useR! 2006

Open Source Geospatial software stacks

Software component stacks

  • A chapter written in 2011 and published three years later has an overview (Bivand 2014)

  • Before discussing software component stacks for GC, we should acknowledge the importance of open standards for geospatial data interchange

  • Kralidis (2008) points out the importance of concepts such as that of spatial data infrastructure, whether established within national jurisdictions, within supranational jurisdictions, or by international standards organisations

  • A fuller coverage of the relationships between spatial data infrastructures and free and open source geospatial software is given by Steiniger and Hunter (2012)

  • Kralidis also helpfully distinguishes between formal, de facto, and ad hoc standards, which provide the flexibility needed to move ahead somewhat faster than standards committees are usually able to do

Component stacks

  • Software components appear to have been defined first by McIlroy (1969), as interchangeable subassemblies by analogy with mass production manufacturing

  • The software component stack has been a core concept of programming at least since the publication of Kernighan and Plauger(1976), systematising the experience of Bell Labs computer scientists

  • Some of the lessons are made clear in programming itself (Kernighan and Pike 1999), while others affect how one may “glue” small utility functions together in an interactive and/or scripting language (Kernighan and Pike 1984)

  • Consequently, a software component stack can be taken as sequence of component programs that are used together to achieve a common goal

  • Using stacks of components becomes attractive when task objectives can more easily be met by using components developed by others than by developing them independently

  • When the costs of keeping a stack working exceed those of rewriting, the stack may fail, but this is seldom the case

  • Open source software developers often advertise application programming interfaces (API), with an implicit promise that other downstream developers using the API will be less subject to incompatible changes

  • It is then vital that changes in these underlying components do not change the way that dependent components function, unless their earlier behaviour had been in error

Dependency challenges

  • As already noted, developers wishing to integrate software components in stacks must pay careful attention to the versioning of the components, and to the impacts of upstream changes on downstream components

  • The terms upstream and downstream refer to the ordering of the components, with data flowing from upstream to downstream components

  • If the specification of an upstream component changes, those following it will need to be modified

  • If the changes are forced by real bugs being fixed, or security holes being blocked, downstream components must react in appropriate ways

  • However, some changes occur for other reasons, such as code cleaning, reimplementation, or the resolution of licence issues in otherwise functioning code

  • In most cases, upstream developers then attempt to reduce changes in their interfaces with downstream components to an unavoidable minimum

  • Open source projects are typically most constrained with respect to developer time for maintenance, including the revision of functioning code to accommodate upstream changes that may not improve downstream performance

  • A particularly troublesome issue for dynamically linked software components in relatively long-running applications is that of thread safety

Open source geospatial projects

  • The Open Source Geospatial Foundation was brought into being in 2006 as a successor to the Mapserver Foundation, itself created the year before

  • In addition to providing a shared infrastructure and procedural framework for web mapping, desktop application and geospatial library projects, OSGeo aims to promote open source geospatial software use and development, including use integrated with proprietary software

  • McIhagga (2008) discusses some of the ways in which communities of practice have developed, with particular reference to web mapping, and in his description, the open source web mapping “ecology”

  • The Geospatial Data Abstraction Library is a crucial part of the upstream geospatial library infrastructure

  • One of the most important components required by geospatial applications is the provision of robust and clear representations of coordinate reference systems

  • Because GEOS uses OGC SFS specifications for geometries, it does not “build” topologies in the classical GIS arc-node understanding

  • GRASS (Geographic Resources Analysis Support System) was already twenty years old when the GRASS developers collaborated in founding OSGeo, and they have been playing an important role in the broader OSGeo movement (Neteler et al. 2008); many of the more recent developments in GRASS are covered by Neteler et al. (n.d.)

Ongoing changes in external sofware (GEOS, GDAL, PROJ)

PROJ

Because so much open source (and other) software uses the PROJ library and framework, many are affected when PROJ upgrades. Until very recently, PROJ has been seen as very reliable, and the changes taking place now are intended to confirm and reinforce this reliability. Before PROJ 5 (PROJ 6 is out now, PROJ 7 is coming early in 2020), the +datum= tag was used, perhaps with +towgs84= with three or seven coefficients, and possibly +nadgrids= where datum transformation grids were available. However, transformations from one projection to another first inversed to longitude-latitude in WGS84, then projected on to the target projection.

Big bump coming:

‘Fast-forward 35 years and PROJ.4 is everywhere: It provides coordinate handling for almost every geospatial program, open or closed source. Today, we see a drastical increase in the need for high accuracy GNSS coordinate handling, especially in the agricultural and construction engineering sectors. This need for geodetic-accuracy transformations is not satisfied by “classic PROJ.4”. But with the ubiquity of PROJ.4, we can provide these transformations “everywhere”, just by implementing them as part of PROJ.4’ (Evers and Knudsen 2017).

Escaping the WGS84 hub/pivot: PROJ and OGC WKT2

Following the introduction of geodetic modules and pipelines in PROJ 5 (Knudsen and Evers 2017; Evers and Knudsen 2017), PROJ 6 moves further. Changes in the legacy PROJ representation and WGS84 transformation hub have been coordinated through the GDAL barn raising initiative. Crucially WGS84 often ceases to be the pivot for moving between datums. A new OGC WKT is coming, and an SQLite EPSG file database has replaced CSV files. SRS will begin to support 3D by default, adding time too as SRS change. See also PROJ migration notes.

There are very useful postings on the PROJ mailing list from Martin Desruisseaux, first proposing clarifications and a follow-up including a summary:

  • “Early binding” ≈ hub transformation technique.
  • “Late binding” ≈ hub transformation technique NOT used, replaced by a more complex technique consisting in searching parameters in the EPSG database after the transformation context (source, target, epoch, area of interest) is known.
  • The problem of hub transformation technique is independent of WGS84. It is caused by the fact that transformations to/from the hub are approximate. Any other hub we could invent in replacement of WGS84 will have the same problem, unless we can invent a hub for which transformations are exact (I think that if such hub existed, we would have already heard about it).

The solution proposed by ISO 19111 (in my understanding) is:

  • Forget about hub (WGS84 or other), unless the simplicity of early-binding is considered more important than accuracy.
  • Associating a CRS to a coordinate set (geometry or raster) is no longer sufficient. A {CRS, epoch} tuple must be associated. ISO 19111 calls this tuple “Coordinate metadata”. From a programmatic API point of view, this means that getCoordinateReferenceSystem() method in Geometry objects (for instance) needs to be replaced by a getCoordinateMetadata() method.

Maybe watch Even Rouault’s recent FOSS4G talk: https://media.ccc.de/v/bucharest-198-revamp-of-coordinate-reference-system-management-in-the-osgeo-c-c-stack-with-proj-and-gdal

In QGIS built on current PROJ 6 with the proj.h API (and GDAL built on current PROJ 6 with the proj.h API), we see the following sequence of GUI windows when trying to open the olinda.gpkg file.

Instead of using the declared coordinate reference system of the added layer to provide a transformation/conversion relationship to possible WGS84 geographical coordinate or web mapping backgrounds, the user of the most recent QGIS version with PROJ 6 faces a choice of three alternatives with varying availabilities and precisions:

The third alternative has better precision, but depends on finding and installing an NTv2 grid file in the PROJ shared/proj metadata folder:

If we install the file, the choices change to promote the more precise NTv2-based path to the first position:

library(sf)
## Linking to GEOS 3.7.2, GDAL 3.0.1, PROJ 6.2.0
packageVersion("sf")
## [1] '0.8.0'

The final element reported by sf::sf_extSoftVersion() shows whether sf was built with the proj.h interface to PROJ, or the legacy proj.api.h interface. However, GDAL also has to be built with the proj.h interface for everything to line up:

sf_extSoftVersion()
##           GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
##        "3.7.2"        "3.0.1"        "6.2.0"         "true"         "true"
st_crs(22525)
## Coordinate Reference System:
##   EPSG: 22525 
##   proj4string: "+proj=utm +zone=25 +south +ellps=intl +towgs84=-205.57,168.77,-4.12,0,0,0,0 +units=m +no_defs"

The OGC WTK2 definition now contains a usage/scope term showing where the definition may be used; there may also be a temporal frame for a definition.

cat(system("projinfo EPSG:22525", intern=TRUE), sep="\n")
## PROJ.4 string:
## +proj=utm +zone=25 +south +ellps=intl +towgs84=-205.57,168.77,-4.12,0,0,0,0 +units=m +no_defs +type=crs
## 
## WKT2_2018 string:
## PROJCRS["Corrego Alegre 1970-72 / UTM zone 25S",
##     BASEGEOGCRS["Corrego Alegre 1970-72",
##         DATUM["Corrego Alegre 1970-72",
##             ELLIPSOID["International 1924",6378388,297,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4225]],
##     CONVERSION["UTM zone 25S",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-33,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",10000000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Brazil - east of 36°W onshore"],
##         BBOX[-10.1,-36,-4.99,-34.74]],
##     ID["EPSG",22525]]

If we ask about possible transformations/conversions, we see choices we saw among those represented in QGIS (I work on two apparently identical systems, which may give different choice counts)

cat(system("projinfo -s EPSG:22525 -t EPSG:31985", intern=TRUE), sep="\n")
## Candidate operations found: 2
## -------------------------------------
## Operation n°1:
## 
## unknown id, Inverse of UTM zone 25S + Corrego Alegre 1970-72 to SIRGAS 2000 (2) + UTM zone 25S, 5 m, Brazil - Corrego Alegre 1970-1972
## 
## PROJ string:
## +proj=pipeline +step +inv +proj=utm +zone=25 +south +ellps=intl +step +proj=push +v_3 +step +proj=cart +ellps=intl +step +proj=helmert +x=-206.05 +y=168.28 +z=-3.82 +step +inv +proj=cart +ellps=GRS80 +step +proj=pop +v_3 +step +proj=utm +zone=25 +south +ellps=GRS80
## 
## WKT2_2018 string:
## CONCATENATEDOPERATION["Inverse of UTM zone 25S + Corrego Alegre 1970-72 to SIRGAS 2000 (2) + UTM zone 25S",
##     SOURCECRS[
##         PROJCRS["Corrego Alegre 1970-72 / UTM zone 25S",
##             BASEGEOGCRS["Corrego Alegre 1970-72",
##                 DATUM["Corrego Alegre 1970-72",
##                     ELLIPSOID["International 1924",6378388,297,
##                         LENGTHUNIT["metre",1]]],
##                 PRIMEM["Greenwich",0,
##                     ANGLEUNIT["degree",0.0174532925199433]],
##                 ID["EPSG",4225]],
##             CONVERSION["UTM zone 25S",
##                 METHOD["Transverse Mercator",
##                     ID["EPSG",9807]],
##                 PARAMETER["Latitude of natural origin",0,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8801]],
##                 PARAMETER["Longitude of natural origin",-33,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8802]],
##                 PARAMETER["Scale factor at natural origin",0.9996,
##                     SCALEUNIT["unity",1],
##                     ID["EPSG",8805]],
##                 PARAMETER["False easting",500000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8806]],
##                 PARAMETER["False northing",10000000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8807]]],
##             CS[Cartesian,2],
##                 AXIS["(E)",east,
##                     ORDER[1],
##                     LENGTHUNIT["metre",1]],
##                 AXIS["(N)",north,
##                     ORDER[2],
##                     LENGTHUNIT["metre",1]],
##             ID["EPSG",22525]]],
##     TARGETCRS[
##         PROJCRS["SIRGAS 2000 / UTM zone 25S",
##             BASEGEOGCRS["SIRGAS 2000",
##                 DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
##                     ELLIPSOID["GRS 1980",6378137,298.257222101,
##                         LENGTHUNIT["metre",1]]],
##                 PRIMEM["Greenwich",0,
##                     ANGLEUNIT["degree",0.0174532925199433]],
##                 ID["EPSG",4674]],
##             CONVERSION["UTM zone 25S",
##                 METHOD["Transverse Mercator",
##                     ID["EPSG",9807]],
##                 PARAMETER["Latitude of natural origin",0,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8801]],
##                 PARAMETER["Longitude of natural origin",-33,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8802]],
##                 PARAMETER["Scale factor at natural origin",0.9996,
##                     SCALEUNIT["unity",1],
##                     ID["EPSG",8805]],
##                 PARAMETER["False easting",500000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8806]],
##                 PARAMETER["False northing",10000000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8807]]],
##             CS[Cartesian,2],
##                 AXIS["(E)",east,
##                     ORDER[1],
##                     LENGTHUNIT["metre",1]],
##                 AXIS["(N)",north,
##                     ORDER[2],
##                     LENGTHUNIT["metre",1]],
##             ID["EPSG",31985]]],
##     STEP[
##         CONVERSION["Inverse of UTM zone 25S",
##             METHOD["Inverse of Transverse Mercator",
##                 ID["INVERSE(EPSG)",9807]],
##             PARAMETER["Latitude of natural origin",0,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8801]],
##             PARAMETER["Longitude of natural origin",-33,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8802]],
##             PARAMETER["Scale factor at natural origin",0.9996,
##                 SCALEUNIT["unity",1],
##                 ID["EPSG",8805]],
##             PARAMETER["False easting",500000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8806]],
##             PARAMETER["False northing",10000000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8807]],
##             ID["INVERSE(EPSG)",16125]]],
##     STEP[
##         COORDINATEOPERATION["Corrego Alegre 1970-72 to SIRGAS 2000 (2)",
##             VERSION["PBS-Bra 2005"],
##             SOURCECRS[
##                 GEOGCRS["Corrego Alegre 1970-72",
##                     DATUM["Corrego Alegre 1970-72",
##                         ELLIPSOID["International 1924",6378388,297,
##                             LENGTHUNIT["metre",1]]],
##                     PRIMEM["Greenwich",0,
##                         ANGLEUNIT["degree",0.0174532925199433]],
##                     CS[ellipsoidal,2],
##                         AXIS["geodetic latitude (Lat)",north,
##                             ORDER[1],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                         AXIS["geodetic longitude (Lon)",east,
##                             ORDER[2],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                     ID["EPSG",4225]]],
##             TARGETCRS[
##                 GEOGCRS["SIRGAS 2000",
##                     DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
##                         ELLIPSOID["GRS 1980",6378137,298.257222101,
##                             LENGTHUNIT["metre",1]]],
##                     PRIMEM["Greenwich",0,
##                         ANGLEUNIT["degree",0.0174532925199433]],
##                     CS[ellipsoidal,2],
##                         AXIS["geodetic latitude (Lat)",north,
##                             ORDER[1],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                         AXIS["geodetic longitude (Lon)",east,
##                             ORDER[2],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                     ID["EPSG",4674]]],
##             METHOD["Geocentric translations (geog2D domain)",
##                 ID["EPSG",9603]],
##             PARAMETER["X-axis translation",-206.05,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8605]],
##             PARAMETER["Y-axis translation",168.28,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8606]],
##             PARAMETER["Z-axis translation",-3.82,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8607]],
##             OPERATIONACCURACY[5.0],
##             ID["EPSG",6193],
##             REMARK["Formed by concatenation of tfms codes 6191 and 15485. May be used as transformation between Corrego Alegre 1970-72 and WGS 84 - see tfm code 6194. Used by Petrobras and ANP from February 2005."]]],
##     STEP[
##         CONVERSION["UTM zone 25S",
##             METHOD["Transverse Mercator",
##                 ID["EPSG",9807]],
##             PARAMETER["Latitude of natural origin",0,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8801]],
##             PARAMETER["Longitude of natural origin",-33,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8802]],
##             PARAMETER["Scale factor at natural origin",0.9996,
##                 SCALEUNIT["unity",1],
##                 ID["EPSG",8805]],
##             PARAMETER["False easting",500000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8806]],
##             PARAMETER["False northing",10000000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8807]],
##             ID["EPSG",16125]]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Brazil - Corrego Alegre 1970-1972"],
##         BBOX[-33.78,-58.16,-2.68,-34.74]]]
## 
## -------------------------------------
## Operation n°2:
## 
## unknown id, Inverse of UTM zone 25S + Corrego Alegre 1970-72 to SIRGAS 2000 (1) + UTM zone 25S, 2 m, Brazil - Corrego Alegre 1970-1972, at least one grid missing
## 
## PROJ string:
## +proj=pipeline +step +inv +proj=utm +zone=25 +south +ellps=intl +step +proj=hgridshift +grids=CA7072_003.gsb +step +proj=utm +zone=25 +south +ellps=GRS80
## 
## WKT2_2018 string:
## CONCATENATEDOPERATION["Inverse of UTM zone 25S + Corrego Alegre 1970-72 to SIRGAS 2000 (1) + UTM zone 25S",
##     SOURCECRS[
##         PROJCRS["Corrego Alegre 1970-72 / UTM zone 25S",
##             BASEGEOGCRS["Corrego Alegre 1970-72",
##                 DATUM["Corrego Alegre 1970-72",
##                     ELLIPSOID["International 1924",6378388,297,
##                         LENGTHUNIT["metre",1]]],
##                 PRIMEM["Greenwich",0,
##                     ANGLEUNIT["degree",0.0174532925199433]],
##                 ID["EPSG",4225]],
##             CONVERSION["UTM zone 25S",
##                 METHOD["Transverse Mercator",
##                     ID["EPSG",9807]],
##                 PARAMETER["Latitude of natural origin",0,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8801]],
##                 PARAMETER["Longitude of natural origin",-33,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8802]],
##                 PARAMETER["Scale factor at natural origin",0.9996,
##                     SCALEUNIT["unity",1],
##                     ID["EPSG",8805]],
##                 PARAMETER["False easting",500000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8806]],
##                 PARAMETER["False northing",10000000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8807]]],
##             CS[Cartesian,2],
##                 AXIS["(E)",east,
##                     ORDER[1],
##                     LENGTHUNIT["metre",1]],
##                 AXIS["(N)",north,
##                     ORDER[2],
##                     LENGTHUNIT["metre",1]],
##             ID["EPSG",22525]]],
##     TARGETCRS[
##         PROJCRS["SIRGAS 2000 / UTM zone 25S",
##             BASEGEOGCRS["SIRGAS 2000",
##                 DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
##                     ELLIPSOID["GRS 1980",6378137,298.257222101,
##                         LENGTHUNIT["metre",1]]],
##                 PRIMEM["Greenwich",0,
##                     ANGLEUNIT["degree",0.0174532925199433]],
##                 ID["EPSG",4674]],
##             CONVERSION["UTM zone 25S",
##                 METHOD["Transverse Mercator",
##                     ID["EPSG",9807]],
##                 PARAMETER["Latitude of natural origin",0,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8801]],
##                 PARAMETER["Longitude of natural origin",-33,
##                     ANGLEUNIT["degree",0.0174532925199433],
##                     ID["EPSG",8802]],
##                 PARAMETER["Scale factor at natural origin",0.9996,
##                     SCALEUNIT["unity",1],
##                     ID["EPSG",8805]],
##                 PARAMETER["False easting",500000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8806]],
##                 PARAMETER["False northing",10000000,
##                     LENGTHUNIT["metre",1],
##                     ID["EPSG",8807]]],
##             CS[Cartesian,2],
##                 AXIS["(E)",east,
##                     ORDER[1],
##                     LENGTHUNIT["metre",1]],
##                 AXIS["(N)",north,
##                     ORDER[2],
##                     LENGTHUNIT["metre",1]],
##             ID["EPSG",31985]]],
##     STEP[
##         CONVERSION["Inverse of UTM zone 25S",
##             METHOD["Inverse of Transverse Mercator",
##                 ID["INVERSE(EPSG)",9807]],
##             PARAMETER["Latitude of natural origin",0,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8801]],
##             PARAMETER["Longitude of natural origin",-33,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8802]],
##             PARAMETER["Scale factor at natural origin",0.9996,
##                 SCALEUNIT["unity",1],
##                 ID["EPSG",8805]],
##             PARAMETER["False easting",500000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8806]],
##             PARAMETER["False northing",10000000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8807]],
##             ID["INVERSE(EPSG)",16125]]],
##     STEP[
##         COORDINATEOPERATION["Corrego Alegre 1970-72 to SIRGAS 2000 (1)",
##             VERSION["IBGE-Bra"],
##             SOURCECRS[
##                 GEOGCRS["Corrego Alegre 1970-72",
##                     DATUM["Corrego Alegre 1970-72",
##                         ELLIPSOID["International 1924",6378388,297,
##                             LENGTHUNIT["metre",1]]],
##                     PRIMEM["Greenwich",0,
##                         ANGLEUNIT["degree",0.0174532925199433]],
##                     CS[ellipsoidal,2],
##                         AXIS["geodetic latitude (Lat)",north,
##                             ORDER[1],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                         AXIS["geodetic longitude (Lon)",east,
##                             ORDER[2],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                     ID["EPSG",4225]]],
##             TARGETCRS[
##                 GEOGCRS["SIRGAS 2000",
##                     DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
##                         ELLIPSOID["GRS 1980",6378137,298.257222101,
##                             LENGTHUNIT["metre",1]]],
##                     PRIMEM["Greenwich",0,
##                         ANGLEUNIT["degree",0.0174532925199433]],
##                     CS[ellipsoidal,2],
##                         AXIS["geodetic latitude (Lat)",north,
##                             ORDER[1],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                         AXIS["geodetic longitude (Lon)",east,
##                             ORDER[2],
##                             ANGLEUNIT["degree",0.0174532925199433]],
##                     ID["EPSG",4674]]],
##             METHOD["NTv2",
##                 ID["EPSG",9615]],
##             PARAMETERFILE["Latitude and longitude difference file","CA7072_003.gsb"],
##             OPERATIONACCURACY[2.0],
##             ID["EPSG",5526],
##             REMARK["May be used as transformation between Corrego Alegre 1970-72 and WGS 84 - see tfm code 5541."]]],
##     STEP[
##         CONVERSION["UTM zone 25S",
##             METHOD["Transverse Mercator",
##                 ID["EPSG",9807]],
##             PARAMETER["Latitude of natural origin",0,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8801]],
##             PARAMETER["Longitude of natural origin",-33,
##                 ANGLEUNIT["degree",0.0174532925199433],
##                 ID["EPSG",8802]],
##             PARAMETER["Scale factor at natural origin",0.9996,
##                 SCALEUNIT["unity",1],
##                 ID["EPSG",8805]],
##             PARAMETER["False easting",500000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8806]],
##             PARAMETER["False northing",10000000,
##                 LENGTHUNIT["metre",1],
##                 ID["EPSG",8807]],
##             ID["EPSG",16125]]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Brazil - Corrego Alegre 1970-1972"],
##         BBOX[-33.78,-58.16,-2.68,-34.74]]]
## 
## Grid CA7072_003.gsb needed but not found on the system.

The input data use the Corrego Alegre 1970-1972 setting, and still provide a +towgs84= key representation for pivoting through WGS84:

olinda <- st_read("data/olinda.gpkg", quiet=TRUE)
st_crs(olinda)
## Coordinate Reference System:
##   No EPSG code
##   proj4string: "+proj=utm +zone=25 +south +ellps=intl +towgs84=-205.57,168.77,-4.12,0,0,0,0 +units=m +no_defs"

We’ll just use one point to check things out:

xy_c <- st_centroid(st_geometry(olinda[  1,]))
st_coordinates(xy_c)
##          X       Y
## 1 295460.1 9120358

If we manually pivot through WGS84 on the way back to SIRGAS2000 UTM, we get:

st_coordinates(st_transform(st_transform(xy_c, 4326), 31985))
##          X       Y
## 1 295489.3 9120352

Without the NTv2 grid file CA7072_003.gsb we seem to get the same:

# without CA7072_003.gsb
st_coordinates(st_transform(xy_c, 31985))
##          X       Y
## 1 295489.3 9120352

but we also get the same with the grid file if we leave the +towgs84= key in the PROJ string:

# with CA7072_003.gsb
st_coordinates(st_transform(xy_c, 31985))
#          X       Y
# 1 295489.3 9120352

If however we manipulate the PROJ string to specify the grid file instead of the +towgs84= key, we can get the improved precision:

# with CA7072_003.gsb
xy_c1 <- xy_c
st_crs(xy_c1) <- "+proj=utm +zone=25 +south +ellps=intl +units=m +nadgrids=CA7072_003.gsb"
print(st_coordinates(st_transform(xy_c1, 31985)), digits=9)
#            X          Y
# 1 295486.396 9120350.62

Let’s try to use the PROJ utility program cs2cs in its PROJ 6 version. The cs2cs version when the grid file is present matches sf::st_transform() when the input CRS is modified to point to the grid file:

# with CA7072_003.gsb
cat(system(paste0("echo ", paste(xy, collapse=" "), " | cs2cs EPSG:22525 EPSG:31985"), intern=TRUE))
# 295486.40 9120350.62 0.00

cs2cs without the grid file gives:

xy <- st_coordinates(xy_c)
# without CA7072_003.gsb
cat(system(paste0("echo ", paste(xy, collapse=" "), " | cs2cs EPSG:22525 EPSG:31985"), intern=TRUE))
## 295488.64    9120352.16 0.00

This matches the second set of +towgs84= coefficients:

# without CA7072_003.gsb
xy_c2 <- xy_c
st_crs(xy_c2) <- "+proj=utm +zone=25 +south +ellps=intl +units=m +towgs84=-206.05,168.28,-3.82,0,0,0,0"
st_coordinates(st_transform(xy_c2, 31985))
##          X       Y
## 1 295488.6 9120352

Using the lwgeom::st_transform_proj() for now uses the proh_api.h interface:

# without CA7072_003.gsb
# -DACCEPT_USE_OF_DEPRECATED_PROJ_API_H
st_coordinates(lwgeom::st_transform_proj(xy_c, 31985))
##          X       Y
## 1 295489.3 9120352

This is the EPSG description of the grid file: https://epsg.io/5541

It was retrieved from: https://www.eye4software.com/files/ntv2/ca70.zip

This page gives a picture of why the changes in PROJ matter - the arrows are in cm per year displacement.

Some grid files are available from https://proj.org/download.html, but because many others are not as freely available (yet), they may need to be dwnloaded from national mapping agencies. Most are relatively large, and also need to be versioned. Do read the README files in the zip archives!

GEOS

A recent upgrade of GEOS from 3.7.1 to 3.7.2 on a CRAN test server led to failures in three packages using rgeos for topological operations. rgeos 0.4-3 set the checkValidity= argument to for example gIntersection() to FALSE (TRUE threw an error if either geometry was invalid). An issue was opened on the sf github repository (rgeos is developed on R-Forge). The test objects (from an example from inlmisc) will be used here:

rgeos::version_GEOS0()
## [1] "3.7.2"
## attr(,"rev")
## [1] "b55d2125"

For rgeos <= 0.4-3, the default was not to check input geometries for validity before trying topological operations, for >= 0.5-1, the default changes when GEOS > 3.7.1 to check for validity. The mode of the argument also changes to integer from logical:

cV_old_default <- ifelse(rgeos::version_GEOS0() >= "3.7.2", 0L, FALSE)
yy <- rgeos::readWKT(readLines("data/invalid.wkt"))
rgeos::gIsValid(yy, byid=TRUE, reason=TRUE)
##                             1                             2 
## "Ring Self-intersection[2 3]"              "Valid Geometry" 
##                             3                             4 
##              "Valid Geometry"              "Valid Geometry"
sf::sf_extSoftVersion()
##           GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H 
##        "3.7.2"        "3.0.1"        "6.2.0"         "true"         "true"

The same underlyng GEOS code is used in sf:

sf::st_is_valid(sf::st_as_sf(yy), reason=TRUE)
## [1] "Ring Self-intersection[2 3]" "Valid Geometry"             
## [3] "Valid Geometry"              "Valid Geometry"

The geometries were also invalid in GEOS 3.7.1, but the operations succeeded:

ply <- rgeos::readWKT(readLines("data/ply.wkt"))
oo <- try(rgeos::gIntersection(yy, ply, byid=TRUE, checkValidity=cV_old_default), silent=TRUE)
print(attr(oo, "condition")$message)
## [1] "TopologyException: Input geom 0 is invalid: Ring Self-intersection at or near point 2 3 at 2 3"
ooo <- try(sf::st_intersection(sf::st_as_sf(yy), sf::st_as_sf(ply)), silent=TRUE)
print(attr(oo, "condition")$message)
## [1] "TopologyException: Input geom 0 is invalid: Ring Self-intersection at or near point 2 3 at 2 3"

In rgeos 0.5-1 and GEOS 3.7.2, new warnings are provided, and advice to check validity.

cV_new_default <- ifelse(rgeos::version_GEOS0() >= "3.7.2", 1L, TRUE)
try(rgeos::gIntersection(yy, ply, byid=TRUE, checkValidity=cV_new_default), silent=TRUE)
## Warning in RGEOSUnaryPredFunc(spgeom, byid, "rgeos_isvalid"): Ring Self-
## intersection at or near point 2 3
## yy is invalid
## Warning in rgeos::gIntersection(yy, ply, byid = TRUE, checkValidity
## = cV_new_default): Invalid objects found; consider using
## set_RGEOS_CheckValidity(2L)

New options are provided, get_RGEOS_CheckValidity() and set_RGEOS_CheckValidity(), because in some packages the use of topological operations may happen through other packages, such as raster::crop() calling rgeos::gIntersection() without access to the arguments of the latter function.

If we follow the advice, zero-width buffering is used to try to rectify the invalidity:

oo <- rgeos::gIntersection(yy, ply, byid=TRUE, checkValidity=2L)
## Warning in RGEOSUnaryPredFunc(spgeom, byid, "rgeos_isvalid"): Ring Self-
## intersection at or near point 2 3
## yy is invalid
## Attempting to make yy valid by zero-width buffering
rgeos::gIsValid(oo)
## [1] TRUE

equivalently:

oo <- rgeos::gIntersection(rgeos::gBuffer(yy, byid=TRUE, width=0), ply, byid=TRUE, checkValidity=1L)
rgeos::gIsValid(oo)
## [1] TRUE

and by extension to sf until GEOS 3.7.2 is accommodated:

ooo <- sf::st_intersection(sf::st_buffer(sf::st_as_sf(yy), dist=0), sf::st_as_sf(ply))
all(sf::st_is_valid(ooo))
## [1] TRUE

The actual cause was the use of an ESRI/shapefile style/understanding of the self-touching exterior ring. In OGC style, an interior ring is required, but not in shapefile style. Martin Davis responded in the issue:

The problem turned out to be a noding robustness issue, which caused the valid input linework to have a self-touch after noding. This caused the output to be invalid. The fix was to tighten up the internal overlay noding validation check to catch this situation. This has the side-effect of detecting (and failing) all self-touches in input geometry. Previously, vertex-vertex self-touches were not detected, and in many cases they would simply propagate through the overlay algorithm. (This made the output invalid as well, but since the inputs were already invalid this behaviour was considered acceptable).

The change in GEOS behaviour was not planned as such, but has consequences, fortunately detected because CRAN checks by default much more than say Travis by default. Zero-width buffering will not repair all cases of invalidity, but does work here.

PROJ 6 impacts ASDAR

In the Soho cholera data set; I also converted the shapefiles from https://asdar-book.org/bundles2ed/die_bundle.zip to GPKG to be more modern (using ogr2ogr in GDAL 3 built against PROJ 6). sf is installed using the proj.h interface in PROJ 6:

buildings <- sf::st_read("data/buildings.gpkg", quiet=TRUE)
st_crs(buildings)
## Coordinate Reference System:
##   No EPSG code
##   proj4string: "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs"

To make an interactive display in mapview(), conversion/transformation to “Web Mercator” is needed - this uses a WGS84 datum. But PROJ 6 has dropped the +datum= tag, so the display is not correctly registered.

library(mapview)
mapview(buildings)

The CRS/SRS values in the GPKG file (it is a multi-table SQLite database) include the datum definition:

library(RSQLite)
db = dbConnect(SQLite(), dbname="data/buildings.gpkg")
dbReadTable(db, "gpkg_spatial_ref_sys")$definition[4]
## [1] "PROJCS[\"Transverse_Mercator\",GEOGCS[\"GCS_OSGB 1936\",DATUM[\"OSGB_1936\",SPHEROID[\"Airy 1830\",6377563.396,299.3249646,AUTHORITY[\"EPSG\",\"7001\"]],AUTHORITY[\"EPSG\",\"6277\"]],PRIMEM[\"Greenwich\",0],UNIT[\"Degree\",0.0174532925199433],AUTHORITY[\"EPSG\",\"4277\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",49],PARAMETER[\"central_meridian\",-2],PARAMETER[\"scale_factor\",0.999601],PARAMETER[\"false_easting\",400000],PARAMETER[\"false_northing\",-100000],UNIT[\"metre\",1,AUTHORITY[\"EPSG\",\"9001\"]],AXIS[\"Easting\",EAST],AXIS[\"Northing\",NORTH]]"
dbDisconnect(db)

Maybe using rgdal which is built using PROJ 6 but the legacy proj_api.h interface, and the shapefile as shipped with ASDAR reproduction materials will help?

buildings1 <- rgdal::readOGR("data/buildings.shp", verbose=FALSE)
sp::proj4string(buildings1)
## [1] "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs"

No, same problem:

mapview(buildings1)

But the shapefile has the datum definition:

readLines("data/buildings.prj")
## [1] "PROJCS[\"Transverse_Mercator\",GEOGCS[\"GCS_OSGB 1936\",DATUM[\"D_OSGB_1936\",SPHEROID[\"Airy_1830\",6377563.396,299.3249646]],PRIMEM[\"Greenwich\",0],UNIT[\"Degree\",0.017453292519943295]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",49.00000000000002],PARAMETER[\"central_meridian\",-2],PARAMETER[\"scale_factor\",0.999601],PARAMETER[\"false_easting\",400000],PARAMETER[\"false_northing\",-100000],UNIT[\"Meter\",1]]"

So in both cases with PROJ 6, we need to manipulate the CRS read in with the file to insert our choice of how to make the transformation, because the definition as read no longer contains it:

fixed <- "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +nadgrids=OSTN15_NTv2_OSGBtoETRS.gsb +units=m +no_defs"
st_crs(buildings) <- fixed
sp::proj4string(buildings1) <- sp::CRS(fixed)
mapview(buildings)
mapview(buildings1)

CRAN packages

CRAN

  • Once S3 permitted extension by writing functions, and packaging functions in libraries, S and R ceased to be monolithic

  • In R, a library is where packages are kept, distinguishing between base and recommended packages distributed with R, and contributed packages

  • Contributed packages can be installed from CRAN (infrastructure built on CPAN and CTAN for Perl and Tex), Bioconductor, other package repositories, and other sources such as github

  • With over 14500 contributed packages, CRAN is central to the R community, but is stressed by dependency issues (CRAN is not run by R core)

CRAN package clusters

  • Andrie de Vries Finding clusters of CRAN packages using igraph looked at CRAN package clusters from a page rank graph

  • Here we will be choosing the packages listed in the Spatial and SpatioTemporal task views as contained in the ctv package, and their pagerank relationships

cran <- available.packages()
library(ctv)
obj1 <- read.ctv(system.file("ctv/Spatial.ctv", package="ctv"))
obj2 <- read.ctv(system.file("ctv/SpatioTemporal.ctv", package="ctv"))
sp_ctv_pkgs <- sort(unique(c(obj1$packagelist$name, obj2$packagelist$name)))
pdb <- cran[cran[, "Package"] %in% sp_ctv_pkgs, ]
suppressPackageStartupMessages(library(miniCRAN))
suppressPackageStartupMessages(library(igraph))
suppressPackageStartupMessages(library(magrittr))
pg <- makeDepGraph(pdb[, "Package"], availPkgs = cran, suggests=TRUE, enhances=TRUE, includeBasePkgs = FALSE)
pr <- pg %>%
page.rank(directed = FALSE) %>%
use_series("vector") %>%
sort(decreasing = TRUE) %>%
as.matrix %>%
set_colnames("page.rank")

CRAN package page rank scores

print(pr[1:30,], digits=4)
##         Rcpp           sp         MASS        knitr     maptools 
##     0.025042     0.018962     0.011704     0.009286     0.008289 
##      lattice       raster        rgdal     testthat        spdep 
##     0.008224     0.007997     0.007912     0.007706     0.007587 
##           sf       Matrix     spatstat surveillance        rgeos 
##     0.007581     0.007314     0.007234     0.006617     0.006316 
##      ggplot2          plm         lme4      plotKML RColorBrewer 
##     0.006186     0.005782     0.005354     0.005183     0.005108 
##    rmarkdown     magrittr          rgl      stplanr RandomFields 
##     0.005094     0.004956     0.004764     0.004668     0.004605 
##        gstat     jsonlite      pkgdown        spaMM    spacetime 
##     0.004551     0.004512     0.004421     0.004267     0.004224
topClusters <- table(cl$membership) %>%
sort(decreasing = TRUE) %>%
head(25)
cluster <- function(i, clusters, pagerank, n=10){
group <- clusters$names[clusters$membership == i]
pagerank[group, ] %>% sort(decreasing = TRUE) %>% head(n)
}
z <- lapply(names(topClusters)[1:15], cluster, clusters=cl, pagerank=pr, n=20)

First package cluster

z[[1]][1:20]
##           sp        knitr     maptools      lattice       raster 
##  0.018961701  0.009286065  0.008289421  0.008223655  0.007996581 
##        rgdal     testthat           sf        rgeos      ggplot2 
##  0.007911978  0.007706242  0.007581470  0.006315732  0.006185982 
##      plotKML RColorBrewer    rmarkdown      stplanr        gstat 
##  0.005182947  0.005107896  0.005094181  0.004667842  0.004550959 
##    spacetime      stringr        Hmisc      mapview       igraph 
##  0.004224012  0.003855975  0.003654207  0.003606242  0.003573066

Second package cluster

z[[2]][1:20]
##         MASS        spdep       Matrix     spatstat surveillance 
##  0.011703869  0.007586704  0.007313960  0.007234205  0.006617480 
## RandomFields         nlme      foreach   sperrorest     CARBayes 
##  0.004604721  0.003584424  0.003286419  0.003216616  0.002860922 
##    geostatsp      FLightR          car      biomod2   CARBayesST 
##  0.002850512  0.002847370  0.002710488  0.002653262  0.002645306 
##         mgcv       georob         boot         lgcp         geoR 
##  0.002580343  0.002522789  0.002427513  0.002255335  0.002160760

Third package cluster

z[[3]][1:20]
##    magrittr    jsonlite     pkgdown          R6        httr      digest 
## 0.004955838 0.004512034 0.004421034 0.003471628 0.003464246 0.003267953 
##     usethis   geojsonio    roxygen2        covr        curl       lintr 
## 0.002763474 0.002379277 0.002355452 0.002225590 0.001975295 0.001901108 
##     geojson        xml2      RNeXML     osmdata        crul  rstudioapi 
## 0.001873022 0.001813405 0.001775832 0.001736993 0.001583010 0.001508990 
##  rmapshaper   rsconnect 
## 0.001381217 0.001356919

Fourth package cluster

z[[4]][1:20]
##          Rcpp           rgl          ctmm RcppArmadillo            BH 
##  0.0250415771  0.0047642536  0.0036110230  0.0021795803  0.0020143644 
##     RcppEigen         dbmss         dodgr          smam         smerc 
##  0.0017826835  0.0017406331  0.0017066628  0.0016736023  0.0016242956 
##       pbapply      RSpectra    SpatialEpi       smacpod       Gmedian 
##  0.0014511915  0.0012763535  0.0011686080  0.0011488122  0.0011345323 
##       svglite        spsann         minqa     ngspatial  RcppParallel 
##  0.0010568585  0.0010399338  0.0010203580  0.0010022323  0.0009753075

CRAN top two page rank clusters

library(RColorBrewer)
library(wordcloud)
opar <- par(mar=c(0,0,0,0)+0.1, mfrow=c(1,2))
for (i in 1:2) wordcloud(names(z[[i]]), freq=unname(z[[i]]), scale=rev(8*range(unname(z[[i]]))/max(unname(z[[4]]))))

par(opar)

CRAN third and fourth page rank clusters

opar <- par(mar=c(0,0,0,0)+0.1, mfrow=c(1,2))
for (i in 3:4) wordcloud(names(z[[i]]), freq=unname(z[[i]]), scale=rev(8*range(unname(z[[i]]))/max(unname(z[[4]]))))

par(opar)

Roundup: history

  • R itself is very varied in the way that approaches have established niches through time

  • sp classes date back to the same time as Bioconductor, which has a fairly definite S4 preference

  • Communities develop and change over time, and continue to do so, with simultaneous differences in preferences

  • so what comes after tidyverse and “data science”? Should we be preparing already?

References

Abelson, Harold, and Gerald Jay Sussman. 1996. Structure and Interpretation of Computer Programs. Boston, MA: MIT Press.

Bao, Shuming, Luc Anselin, Doug Martin, and Diana Stralberg. 2000. “Seamless Integration of Spatial Statistics and GIS: The S-PLUS for ArcView and the S+Grassland Links.” Journal of Geographical Systems 2 (2): 287–306.

Becker, R.A., and J.M. Chambers. 1984. S: An Interactive Environment for Data Analysis and Graphics. Pacific Grove, CA, USA: Wadsworth & Brooks/Cole.

———. 1985. Extending the S System. Pacific Grove, CA, USA: Wadsworth & Brooks/Cole.

Becker, Richard A., John M. Chambers, and Allan R. Wilks. 1988. The New S Language. London: Chapman & Hall.

Bivand, Roger. 2000. “Using the R Statistical Data Analysis Language on GRASS 5.0 GIS Database Files.” Computers & Geosciences 26 (9): 1043–52. https://doi.org/https://doi.org/10.1016/S0098-3004(00)00057-1.

———. 2002. “Spatial Econometrics Functions in R: Classes and Methods.” Journal of Geographical Systems 4 (4): 405–21. https://doi.org/10.1007/s101090300096.

———. 2006. “Implementing Spatial Data Analysis Software Tools in R.” Geographical Analysis 38 (1): 23–40. https://doi.org/10.1111/j.0016-7363.2005.00672.x.

———. 2014. “Geocomputation and Open Source Software:Components and Software Stacks.” In Geocomputation, edited by Robert J. Abrahart and Linda M. See, 329–55. Boca Raton: CRC Press. http://hdl.handle.net/11250/163358.

Bivand, Roger, and Albrecht Gebhardt. 2000. “Implementing Functions for Spatial Statistical Analysis Using the R Language.” Journal of Geographical Systems 2 (3): 307–17. https://doi.org/10.1007/PL00011460.

Bivand, Roger, and Markus Neteler. 2000. Open Source Geocomputation: Using the R Data Analysis Language Integrated with GRASS GIS and PostgreSQL Data Base Systems. http://www.geocomputation.org/2000/GC009/Gc009.htm.

Chambers, John M. 1998. Programming with Data. New York: Springer.

———. 2016. Extending R. Boca Raton: Chapman & Hall.

Chambers, John M., and Trevor J. Hastie. 1992. Statistical Models in S. London: Chapman & Hall.

Evers, Kristian, and Thomas Knudsen. 2017. Transformation Pipelines for Proj.4. https://www.fig.net/resources/proceedings/fig_proceedings/fig2017/papers/iss6b/ISS6B_evers_knudsen_9156.pdf.

Gómez-Rubio, V., J. Ferrándiz-Ferragud, and A. López-Quílez. 2005. “Detecting Clusters of Disease with R.” Journal of Geographical Systems 7: 189–206.

Gómez-Rubio, V., and A. López-Quílez. 2005. “RArcInfo: Using GIS Data with R.” Computers and Geosciences 31: 1000–1006.

Ihaka, Ross, and Robert Gentleman. 1996. “R: A Language for Data Analysis and Graphics.” Journal of Computational and Graphical Statistics 5 (3): 299–314. https://doi.org/10.1080/10618600.1996.10474713.

Kernighan, Brian W., and Rob Pike. 1984. The UNIX Programming Environment. Englewood Cliffs, N. J.: Prentice-Hall.

———. 1999. The Practice of Programming. Reading, Mass.: Addison-Wesley.

Kernighan, Brian W., and P. J. Plauger. 1976. Software Tools. Reading, Mass.: Addison-Wesley.

Knudsen, Thomas, and Kristian Evers. 2017. Transformation Pipelines for Proj.4. https://meetingorganizer.copernicus.org/EGU2017/EGU2017-8050.pdf.

Kralidis, Athanasios Tom. 2008. “Geospatial Open Source and Open Standards Convergences.” In Open Source Approaches in Spatial Data Handling, edited by G. B. Hall and M. Leahy, 1–20. Berlin: Springer-Verlag.

McIhagga, David. 2008. “Communities of Practice and the Business of Open Sourceweb Mapping.” In Open Source Approaches in Spatial Data Handling, edited by G. B. Hall and M. Leahy, 49–64. Berlin: Springer-Verlag.

McIlroy, M. D. 1969. “’Mass produced’ software components.” In Software Engineering, edited by Peter Naur and Brian Randell, 79–87. Brussels: Conference on Software Engineering, NATO Science Committee; NATO Scientific Affairs Division.

Neteler, Markus, M. Hamish Bowman, Martin Landa, and Markus Metz. n.d. “GRASS GIS: A multi-purpose open source GIS.” Environmental Modelling & Software 31 (1): 124–30.

Neteler, M., D.E. Beaudette, P. Cavallini, L. Lami, and J. Cepicky. 2008. “GRASS GIS.” In Open Source Approaches in Spatial Data Handling, edited by G. B. Hall and M. Leahy, 171–99. Berlin: Springer-Verlag.

Pebesma, Edzer J. 2004. “Multivariable Geostatistics in S: The Gstat Package.” Computers and Geosciences 30: 683–91.

Pebesma, E. J., and R. S. Bivand. 2005. “Classes and Methods for Spatial Data in R.” R News 5 (2): 9–13.

Pebesma, E. J., and C. G. Wesseling. 1998. “Gstat, a Program for Geostatistical Modelling, Prediction and Simulation.” Computers and Geosciences 24: 17–31.

Rowlingson, B., and P. J. Diggle. 1993. “Splancs: Spatial Point Pattern Analysis Code in S-Plus.” Computers and Geosciences 19: 627–55.

Steiniger, Stefan, and Andrew J. S. Hunter. 2012. “Free and Open Source GIS Software for Building a Spatial Data Infrastructure.” In Geospatial Free and Open Source Software in the 21st Century, edited by Erwan Bocher and Markus Neteler, 247–61. Lecture Notes in Geoinformation and Cartography. Berlin Heidelberg: Springer.

Tierney, Luke. 1990. LISP-STAT: An Object-Oriented Environment for Statistical Computing and Dynamic Graphics. Wiley, New York, NY.

———. 1996. “Recent Developments and Future Directions in Lisp-Stat.” Journal of Computational and Graphical Statistics 5 (3): 250–62.

———. 2005. “Some Notes on the Past and Future of Lisp-Stat.” Journal of Statistical Software 13 (9): 1–15. https://doi.org/10.18637/jss.v013.i09.

Venables, William N., and Brian D. Ripley. 2000. S Programming. New York: Springer. http://www.stats.ox.ac.uk/pub/MASS3/Sprog/.

Wickham, Hadley. 2014. Advanced R. Boca Raton, FL: Chapman & Hall. http://adv-r.had.co.nz/.