All the material presented here, to the extent it is original, is available under CC-BY-SA.
I am running R 3.6.2, with recent update.packages()
.
needed <- c("jpeg", "mapview", "ggplot2", "RColorBrewer", "classInt", "tmap", "sf", "rnaturalearthdata", "rnaturalearth", "matrixStats")
Script and data at https://github.com/rsbivand/BAN422_V20/raw/master/ban422_v20_tues.zip. Download to suitable location, unzip and use as basis.
Today: binning intervals; conditioning (trellis, lattice, facets); and graphics devices and base graphics
Tomorrow: grid graphics; interactive graphics; shiny and dashboards
After the lunch break, we’ll go on to look at graphics devices and base graphics
We saw yesterday that histograms and many other kinds of chart require user choices about the number of bins to be used and bin/class intervals
This may also be termed quantization: the division of part of the real line on which we have measured a variable into intervals
This can also apply to combined categories if they are recoded to reduce the number of alternatives to be displayed
Class intervals are much used in thematic cartography, and I’m the author of the classInt package
QQQ1 <- readRDS("../mon/dicook/pisa_subset.rds")
countries <- readRDS("../mon/dicook/countries.rds")
countries$Alpha_2[countries$Name == "Kosovo"] <- "XK"
a0 <- split(QQQ1, list(QQQ1$CNT, QQQ1$ST004D01T))
math_mean <- sapply(a0, function(x) weighted.mean(x$math_mean, w=x$SENWT))
n2 <- length(math_mean)/2
country <- sapply(strsplit(names(math_mean), "\\."), "[", 1)[1:n2]
co <- match(country, countries$CNT)
gender <- factor(sapply(strsplit(names(math_mean), "\\."), "[", 2))
library(matrixStats)
sqn <- sqrt(sapply(a0, function(x) sum(x$SENWT)))
math_se <- sapply(a0, function(x) weightedSd(x$math_mean, w=x$SENWT))/sqn
read_mean <- sapply(a0, function(x) weighted.mean(x$read_mean, w=x$SENWT))
sci_mean <- sapply(a0, function(x) weighted.mean(x$sci_mean, w=x$SENWT))
math_mean_female <- math_mean[gender == "Female"]
math_mean_male <- math_mean[gender == "Male"]
math_se_female <- math_se[gender == "Female"]
math_se_male <- math_se[gender == "Male"]
math_gap <- math_mean_female - math_mean_male
read_mean_female <- read_mean[gender == "Female"]
read_mean_male <- read_mean[gender == "Male"]
read_gap <- read_mean_female - read_mean_male
sci_mean_female <- sci_mean[gender == "Female"]
sci_mean_male <- sci_mean[gender == "Male"]
sci_gap <- sci_mean_female - sci_mean_male
df <- data.frame(math_mean_female, math_mean_male, math_se_female, math_se_male,
read_mean_female, read_mean_male, sci_mean_female, sci_mean_male,
math_gap, read_gap, sci_gap, iso_a3=country, iso_a2=countries$Alpha_2[co])
We use a 1:50m set of country boundaries (in the 1:110m set, Singapore disappears).
library(rnaturalearth)
library(rnaturalearthdata)
data(countries50)
library(sf)
## Linking to GEOS 3.8.0, GDAL 3.1.0dev-84d972714a, PROJ 6.2.1
countries50 <- st_as_sf(countries50)
countries50$iso_a2[countries50$name == "Kosovo"] <- "XK"
We can merge()
the gaps data with the country boundaries, and the tmap package to create the map. In doing so yesterday, we did not look at the arguments. palette=
sets the colour palette ("div"
for diverging at zero), n=
the intended number of bins/classes, style=
the method for constructing the bins, and midpoint=0
to split the bins on zero.
world_gaps <- merge(countries50[!is.na(countries50$iso_a2),], df, by="iso_a2", all.x=TRUE)
library(tmap)
tm_shape(world_gaps) + tm_fill("math_gap", palette="RdYlGn", n=7, style="jenks", midpoint=0)
## Warning: The shape world_gaps is invalid. See sf::st_is_valid
Class intervals can be chosen in many ways, and some have been collected for convenience in the classInt package
The first problem is to assign class boundaries to values in a single dimension, for which many classification techniques may be used, including pretty, quantile, natural breaks among others, or even simple fixed values
From there, the intervals can be used to generate colours from a colour palette, using the very nice colorRampPalette()
function
Because there are potentially many alternative class memberships even for a given number of classes, choosing a communicative set matters
We may choose the number of intervals ourselves arbitrarily or after examination of the data, or use provided functions, such as nclass.Sturges()
, nclass.scott()
or nclass.FD()
. In hist()
, nclass.Sturges()
is used by default. We can also split on sign()
, but handling diverging intervals often involves more work.
x <- na.exclude(world_gaps$math_gap)
nclass.Sturges(x)
## [1] 8
nclass.scott(x)
## [1] 6
(n <- nclass.FD(x))
## [1] 7
nclass.Sturges(x[sign(x) == 1L])
## [1] 6
nclass.Sturges(x[sign(x) == -1L])
## [1] 7
The default intervals for bins in hist()
are pretty(range(x), n = breaks, min.n = 1)
, where breaks <- nclass.Sturges(x)
. The function computes a sequence of about n+1 equally spaced ‘round’ values which cover the range of the values in x. The values are chosen like values of coins or banknotes (1, 2, 5, etc.)
n
## [1] 7
range(x)
## [1] -26.96514 17.60435
(p <-pretty(x, n=n))
## [1] -30 -25 -20 -15 -10 -5 0 5 10 15 20
length(p)
## [1] 11
(p <-pretty(x, n=n, high.u.bias=3))
## [1] -30 -20 -10 0 10 20
length(p)
## [1] 6
If we use the classIntervals()
function from classInt, we can pass through arguments to the function called through style=
, and note that n
will not necessarily be the number of output classes. By default, intervalClosure=
is "left"
, so [-30, -20)
means numbers greater than and equal to (>=) -30 and less than (<) -20; [10, 20]
is numbers >= 10 and <= 20.
suppressPackageStartupMessages(library(classInt))
(ppd <- classIntervals(x, n=n, style="pretty",
cutlabels=TRUE))
## style: pretty
## one of 49,280,065,120 possible partitions of this variable into 10 classes
## [-30,-25) [-25,-20) [-20,-15) [-15,-10) [-10,-5) [-5,0) [0,5) [5,10)
## 1 2 6 10 20 6 13 7
## [10,15) [15,20]
## 3 1
(pp3 <- classIntervals(x, n=n, style="pretty",
high.u.bias=3, cutlabels=TRUE))
## style: pretty
## one of 814,385 possible partitions of this variable into 5 classes
## [-30,-20) [-20,-10) [-10,0) [0,10) [10,20]
## 3 16 26 20 4
library(RColorBrewer)
pal <- brewer.pal(7, "RdYlGn")
oopar <- par(mfrow=c(1, 2))
plot(ppd, pal=pal, main="Pretty (default)", xlab="", ylab="")
plot(pp3, pal=pal, main="Pretty (compressed)", xlab="", ylab="")
par(oopar)
Quantiles may seem simple, but there are many ways of implementing the breaks, see quantile()
. We can also set the dataPrecision=
to make the class breaks easier to read:
(pq7 <- classIntervals(x, n=n, style="quantile",
type=7L, dataPrecision=2))
## style: quantile
## one of 109,453,344 possible partitions of this variable into 7 classes
## [-26.96,-14.38) [-14.38,-9.75) [-9.75,-6.97) [-6.97,-4.01) [-4.01,1.87)
## 10 10 10 9 10
## [1.87,6.51) [6.51,17.61]
## 10 10
(pq3 <- classIntervals(x, n=n, style="quantile",
type=3L, dataPrecision=2))
## style: quantile
## one of 109,453,344 possible partitions of this variable into 7 classes
## [-26.96,-14.67) [-14.67,-9.96) [-9.96,-7.12) [-7.12,-5.2) [-5.2,1.85)
## 9 10 10 9 10
## [1.85,6.51) [6.51,17.61]
## 10 11
oopar <- par(mfrow=c(1, 2))
plot(pq7, pal=pal, main="Quantile type=7", xlab="", ylab="")
plot(pq3, pal=pal, main="Quantile type=3", xlab="", ylab="")
par(oopar)
The "equal"
style divides the range into n
equal parts, while "sd"
centres and scales the variable before using pretty
on the result, converting back to get the breaks:
(peq <- classIntervals(x, n=n, style="equal",
dataPrecision=2))
## style: equal
## one of 109,453,344 possible partitions of this variable into 7 classes
## [-26.96,-20.59) [-20.59,-14.23) [-14.23,-7.86) [-7.86,-1.49) [-1.49,4.88)
## 3 8 17 14 16
## [4.88,11.24) [11.24,17.61]
## 7 4
diff(peq$brks)
## [1] 6.36707 6.36707 6.36707 6.36707 6.36707 6.36707 6.36707
(psd <- classIntervals(x, n=n, style="sd",
high.u.bias=3, dataPrecision=2))
## style: sd
## one of 10,424,128 possible partitions of this variable into 6 classes
## [-32.85,-23.44) [-23.44,-14.03) [-14.03,-4.62) [-4.62,4.79) [4.79,14.2)
## 1 10 28 19 9
## [14.2,23.61]
## 2
(psd$brks-attr(psd, "parameters")["center"]) /
attr(psd, "parameters")["scale"]
## [1] -3 -2 -1 0 1 2 3
oopar <- par(mfrow=c(1, 2))
plot(peq, pal=pal, main="Equal intervals", xlab="", ylab="")
plot(psd, pal=pal, main="Standard deviations", xlab="", ylab="")
par(oopar)
Hierarchical clustering is a well-known approach to trying to find similarities between multivariate observations based on dissimilarities or distances. If we use distances on the real line, we can build cluster trees for our univariate observations and chosen method=
, and cut the trees for chosen numbers of classes:
(phc7 <- classIntervals(x, n=n, style="hclust",
method="complete", dataPrecision=2))
## style: hclust
## one of 109,453,344 possible partitions of this variable into 7 classes
## [-26.96,-24.51) [-24.51,-13.13) [-13.13,-6.6) [-6.6,-1.45) [-1.45,5.34)
## 1 12 17 12 16
## [5.34,10.39) [10.39,17.61]
## 7 4
(phc5 <- getHclustClassIntervals(phc7, 5))
## style: hclust
## one of 814,385 possible partitions of this variable into 5 classes
## [-26.96514,-24.51651) [-24.51651,-13.1365) [-13.1365,-1.453571)
## 1 12 29
## [-1.453571,10.38568) [10.38568,17.60435]
## 23 4
oopar <- par(mfrow=c(1, 2))
plot(phc7, pal=pal, main="Hierarchical clusters (7)", xlab="", ylab="")
plot(phc5, pal=pal, main="Hierarchical clusters (5)", xlab="", ylab="")
par(oopar)
kmeans
provides a non-hierarchical clustering approach, and can be combined with hierarchical clustering using bagged clustering provided by e1071::bclust()
; here we are using set.seed(1)
to be able to reproduce the output:
set.seed(1)
(pk7 <- classIntervals(x, n=n, style="kmeans",
dataPrecision=2))
## style: kmeans
## one of 109,453,344 possible partitions of this variable into 7 classes
## [-26.96,-17.46) [-17.46,-10.43) [-10.43,-4.51) [-4.51,0.64) [0.64,5.34)
## 5 14 20 7 12
## [5.34,10.39) [10.39,17.61]
## 7 4
set.seed(1)
(pbc7 <- classIntervals(x, n=n, style="bclust",
verbose=FALSE, dataPrecision=2))
## style: bclust
## one of 109,453,344 possible partitions of this variable into 7 classes
## [-26.96,-24.51) [-24.51,-19.14) [-19.14,-13.13) [-13.13,-4.51) [-4.51,5.34)
## 1 3 9 26 19
## [5.34,12.21) [12.21,17.61]
## 8 3
oopar <- par(mfrow=c(1, 2))
plot(pk7, pal=pal, main="K-means clusters", xlab="", ylab="")
plot(pbc7, pal=pal, main="Bagged clusters", xlab="", ylab="")
par(oopar)
There are two implementations of the cartographic natural breaks approach, contributed by Hisaji Ono. As we have already seen, most methods for defining intervals differ in details. As with the cluster approaches, the intention is to place the breaks in obvious gaps in the distribution of the variable.
(pj7 <- classIntervals(x, n=n, style="jenks",
dataPrecision=2))
## style: jenks
## one of 109,453,344 possible partitions of this variable into 7 classes
## [-26.97,-19.94] (-19.94,-13.64] (-13.64,-8.17] (-8.17,-2.3] (-2.3,4.15]
## 4 9 15 14 16
## (4.15,9.2] (9.2,17.6]
## 7 4
(pf7 <- classIntervals(x, n=n, style="fisher",
dataPrecision=2))
## style: fisher
## one of 109,453,344 possible partitions of this variable into 7 classes
## [-26.96,-19.14) [-19.14,-13.13) [-13.13,-7.72) [-7.72,-1.45) [-1.45,5.34)
## 4 9 15 14 16
## [5.34,10.39) [10.39,17.61]
## 7 4
oopar <- par(mfrow=c(1, 2))
plot(pj7, pal=pal, main="Jenks natural breaks", xlab="", ylab="")
plot(pf7, pal=pal, main="Fisher natural breaks", xlab="", ylab="")
par(oopar)
Once found, the breaks need to be applied to the data vector (or vectors - data to be contrasted may need shared intervals). In classInt the findCols()
function can be used, wrapping base::findInterval()
; base::cut()
could also be used:
str(findCols(pj7))
## num [1:69] 6 6 1 1 4 2 5 2 3 3 ...
str(findInterval(x, pj7$brks, all.inside=TRUE,
left.open=TRUE))
## int [1:69] 6 6 1 1 4 2 5 2 3 3 ...
str(as.integer(cut(x, breaks=pj7$brks,
include.lowest=TRUE)))
## int [1:69] 6 6 1 1 4 2 5 2 3 3 ...
suppressPackageStartupMessages(library(ggplot2))
stat_bin
## function (mapping = NULL, data = NULL, geom = "bar", position = "stack",
## ..., binwidth = NULL, bins = NULL, center = NULL, boundary = NULL,
## breaks = NULL, closed = c("right", "left"), pad = FALSE,
## na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
## {
## layer(data = data, mapping = mapping, stat = StatBin, geom = geom,
## position = position, show.legend = show.legend, inherit.aes = inherit.aes,
## params = list(binwidth = binwidth, bins = bins, center = center,
## boundary = boundary, breaks = breaks, closed = closed,
## pad = pad, na.rm = na.rm, ...))
## }
## <bytecode: 0x1cbceda0>
## <environment: namespace:ggplot2>
StatBin
## <ggproto object: Class StatBin, Stat, gg>
## aesthetics: function
## compute_group: function
## compute_layer: function
## compute_panel: function
## default_aes: uneval
## extra_params: na.rm
## finish_layer: function
## non_missing_aes:
## parameters: function
## required_aes: x
## retransform: TRUE
## setup_data: function
## setup_params: function
## super: <ggproto object: Class Stat, gg>
cat(capture.output(print(StatBin$compute_group))[-(1:5)], sep="\n")
## <Inner function (f)>
## function (data, scales, binwidth = NULL, bins = NULL, center = NULL,
## boundary = NULL, closed = c("right", "left"), pad = FALSE,
## breaks = NULL, origin = NULL, right = NULL, drop = NULL,
## width = NULL)
## {
## if (!is.null(breaks)) {
## if (!scales$x$is_discrete()) {
## breaks <- scales$x$transform(breaks)
## }
## bins <- bin_breaks(breaks, closed)
## }
## else if (!is.null(binwidth)) {
## if (is.function(binwidth)) {
## binwidth <- binwidth(data$x)
## }
## bins <- bin_breaks_width(scales$x$dimension(), binwidth,
## center = center, boundary = boundary, closed = closed)
## }
## else {
## bins <- bin_breaks_bins(scales$x$dimension(), bins, center = center,
## boundary = boundary, closed = closed)
## }
## bin_vector(data$x, bins, weight = data$weight, pad = pad)
## }
ggplot2:::bin_breaks_bins
## function (x_range, bins = 30, center = NULL, boundary = NULL,
## closed = c("right", "left"))
## {
## stopifnot(length(x_range) == 2)
## bins <- as.integer(bins)
## if (bins < 1) {
## stop("Need at least one bin.", call. = FALSE)
## }
## else if (zero_range(x_range)) {
## width <- 0.1
## }
## else if (bins == 1) {
## width <- diff(x_range)
## boundary <- x_range[1]
## }
## else {
## width <- (x_range[2] - x_range[1])/(bins - 1)
## }
## bin_breaks_width(x_range, width, boundary = boundary, center = center,
## closed = closed)
## }
## <bytecode: 0x1c036958>
## <environment: namespace:ggplot2>
cat(capture.output(print(ggplot2:::bin_breaks_width))[-(4:8)], sep="\n")
## function (x_range, width = NULL, center = NULL, boundary = NULL,
## closed = c("right", "left"))
## {
## if (!is.null(boundary) && !is.null(center)) {
## stop("Only one of 'boundary' and 'center' may be specified.")
## }
## else if (is.null(boundary)) {
## if (is.null(center)) {
## boundary <- width/2
## }
## else {
## boundary <- center - width/2
## }
## }
## x_range <- as.numeric(x_range)
## width <- as.numeric(width)
## boundary <- as.numeric(boundary)
## shift <- floor((x_range[1] - boundary)/width)
## origin <- boundary + shift * width
## max_x <- x_range[2] + (1 - 1e-08) * width
## breaks <- seq(origin, max_x, width)
## if (length(breaks) > 1e+06) {
## stop("The number of histogram bins must be less than 1,000,000.\nDid you make `binwidth` too small?",
## call. = FALSE)
## }
## bin_breaks(breaks, closed = closed)
## }
## <bytecode: 0x1e1e3a30>
## <environment: namespace:ggplot2>
ggplot2:::bin_breaks
## function (breaks, closed = c("right", "left"))
## {
## bins(breaks, closed)
## }
## <bytecode: 0x3b1c0b8>
## <environment: namespace:ggplot2>
ggplot2:::bins
## function (breaks, closed = c("right", "left"), fuzz = 1e-08 *
## stats::median(diff(breaks)))
## {
## stopifnot(is.numeric(breaks))
## closed <- match.arg(closed)
## breaks <- sort(breaks)
## if (closed == "right") {
## fuzzes <- c(-fuzz, rep.int(fuzz, length(breaks) - 1))
## }
## else {
## fuzzes <- c(rep.int(-fuzz, length(breaks) - 1), fuzz)
## }
## structure(list(breaks = breaks, fuzzy = breaks + fuzzes,
## right_closed = closed == "right"), class = "ggplot2_bins")
## }
## <bytecode: 0x1bb0b1e8>
## <environment: namespace:ggplot2>
We can also pass palettes and class intervals through (maybe need to stretch the min-max interval bounds)
library(mapview)
mapview(world_gaps, zcol="math_gap",
col.regions=pal, at=pj7$brks)
The tmap package provides cartographically informed, grammar of graphics (gg) based functionality now, like ggplot2 using grid graphics.
John McIntosh tried with ggplot2, with quite nice results
I suggested he look at tmap, and things got better, because tmap can switch between interactive and static viewing
tmap also provides direct access to classInt class intervals
Like the sf::plot()
method, tmap plotting can use classInt internally and accepts a palette (try looking at tmaptools::palette_explorer()
for ColorBrewer palettes):
library(tmap)
tm_shape(world_gaps) + tm_fill("math_gap",
n=n, style="jenks", palette=pal, midpoint=0) +
tm_borders(lwd=0.5, alpha=0.4)
## Warning: The shape world_gaps is invalid. See sf::st_is_valid
If we give a short vector of column names, we will get facet/panel displays, but need to use tm_facets(free.scales=FALSE)
to use the same class intervals
tm_shape(world_gaps) + tm_fill(c("math_mean_female", "math_mean_male"), n=7, style="jenks", palette=pal) +
tm_facets(free.scales=FALSE) + tm_borders(lwd=0.5, alpha=0.4) + tm_layout(panel.labels=c("Female", "Male"))
## Warning: The shape world_gaps is invalid. See sf::st_is_valid
The underlying logic of conditioned graphics is that multiple displays (windows, panes) use the same scales and representational elements for comparison
Using the same scales and representational elements for comparison can be done manually, imposing the same scales, colours and shapes in each plot and laying the plots out in a grid
Trellis graphics automated this in S, and lattice provides similar but enhanced facilities in R with a formula interface
ggplot2 and other packages also provide similar functionalities
To try this out, let’s use about 25,000 PISA Science PV means by country, gender and count of books at home, choosing the appropriate country and gender data.frame
objects from the big list:
nordics <- c("DNK", "FIN", "ISL", "NOR", "SWE")
a1 <- a0[which(sapply(strsplit(names(a0), "\\."),
"[", 1) %in% nordics)]
a11 <- do.call("rbind", a1)
a11$CNT <- droplevels(a11$CNT)
levels(a11$ST013Q01TA) <- sub("More than",
">", sub(" books", "", levels(a11$ST013Q01TA)))
saveRDS(a11, "../mon/dicook/a11.rds")
library(lattice)
library(ggplot2)
histogram(~ sci_mean | CNT*ST004D01T, data=a11, main="Science mean PVs", xlab="")
ggplot(a11, aes(x=sci_mean)) + geom_histogram() + facet_wrap(~ ST004D01T + CNT, nrow=2) +
ggtitle("Science mean PVs") + xlab("")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
histogram(~ sci_mean | CNT*ST004D01T, data=a11, main="Science mean PVs", xlab="", type="count")
densityplot(~ sci_mean | ST004D01T, groups=CNT, data=a11, auto.key=list(space="right"),
plot.points=FALSE, main="Science mean PVs", xlab="")
ggplot(a11, aes(x=sci_mean)) + geom_density(aes(colour=CNT)) + facet_wrap(~ ST004D01T, ncol=2) +
ggtitle("Science mean PVs") + xlab("")
densityplot(~ sci_mean | ST004D01T, groups=CNT, data=a11, auto.key=list(space="right"),
plot.points=FALSE, main="Science mean PVs", xlab="", type=c("l", "g"))
densityplot(~ sci_mean | CNT, groups=ST004D01T, a11, auto.key=list(space="right"),
plot.points=FALSE, main="Science mean PVs", xlab="")
ggplot(a11, aes(x=sci_mean)) + geom_density(aes(colour=ST004D01T)) + facet_wrap(~ CNT, ncol=3) +
ggtitle("Science mean PVs") + xlab("")
bwplot(sci_mean ~ ST013Q01TA | ST004D01T, a11, main="Science mean PVs", xlab="books", ylab="")
ggplot(na.omit(a11), aes(ST013Q01TA, sci_mean)) + geom_boxplot(aes(group=ST013Q01TA)) +
facet_wrap(~ ST004D01T) + ggtitle("Science mean PVs") + xlab("books") + ylab("")
bwplot(sci_mean ~ ST013Q01TA | ST004D01T, a11, main="Science mean PVs", xlab="books", ylab="", varwidth=TRUE)
ggplot(na.omit(a11), aes(ST013Q01TA, sci_mean)) + geom_boxplot(aes(group=ST013Q01TA), varwidth=TRUE) +
facet_wrap(~ ST004D01T) + ggtitle("Science mean PVs") + xlab("books") + ylab("")
In early R, all the graphics functionality was in base; graphics was split out of base in 1.9.0 (2004-04-12), and grDevices in 2.0.0 (2004-10-04)
When R starts now, the graphics and grDevices packages are loaded and attached, ready for use
graphics provides the user-level graphical functions and methods, especially the most used plot()
methods that many other packages extend
grDevices provides lower-level interfaces to graphics devices, some of which create files and others display windows
The search()
function shows the packages present in the search path, so here we run an instance of R through system()
to check the startup status. In RStudio, one will also see "tools:rstudio"
in the search path.
cat(system('echo "search()" | R --no-save --vanilla', intern=TRUE)[20:23], sep="\n")
## > search()
## [1] ".GlobalEnv" "package:stats" "package:graphics"
## [4] "package:grDevices" "package:utils" "package:datasets"
## [7] "package:methods" "Autoloads" "package:base"
The PDF (pdf()
) and PostScript (postscript()
) devices write commonly used vector files
Display formats and raster/pixel file devices must rasterize vector graphics input
Display formats vary by platform: X11()
on X11 systems, quartz()
on macOS, windows()
on Windows and are available if compiled
png()
, jpeg()
, tiff()
are generally available, svg()
and cairo_pdf()
and cairo_ps()
may also be built
The capabilities()
function shows what R itself can offer, including non-graphics capabilities, and we can also check the versions of external software used
capabilities()
## jpeg png tiff tcltk X11 aqua
## TRUE TRUE TRUE TRUE TRUE FALSE
## http/ftp sockets libxml fifo cledit iconv
## TRUE TRUE TRUE TRUE FALSE TRUE
## NLS profmem cairo ICU long.double libcurl
## TRUE FALSE TRUE TRUE TRUE TRUE
grSoftVersion()
## cairo libpng jpeg
## "1.16.0" "1.6.37" "6.2"
## libtiff
## "LIBTIFF, Version 4.0.10"
When there is no device active, the current device is the null device at position 1
; one can open several and move between active devices. In RStudio, the devices used vary by context - if the console is used, RStudioGD
will be used, backed by png
; in a notebook, small embedded devices appear (apparently PNG inline images).In addition to providing a device, RStudio appears to work quite hard to embed fonts and crop white space on the edges of the graphics file
a11 <- readRDS("../mon/dicook/a11.rds")
png("plot.png")
dev.cur()
## png
## 3
boxplot(sci_mean ~ ST004D01T, a11)
dev.off()
## png
## 2
png(tempfile())
unlist(dev.capabilities())
## semiTransparency transparentBackground rasterImage
## "TRUE" "semi" "yes"
## capture locator
## "FALSE" "FALSE"
dev.size("in")
## [1] 6.666667 6.666667
dev.interactive()
## [1] FALSE
dev.off()
## png
## 2
The markdown code to include say a logo image is: ![alt text][logo]
, and using this we can display the boxplot created above.
Read in and display plot.png
:
[plot.png][plot.png]
The graphics package provides the basic components for drawing on the devices, and user-level functions bundling the components
A boxplot()
will need to be placed in a plotting area, the boxes and whiskers drawn, axes and ticks drawn, and anotation added
The graphical parameters are set and may be modified using the par()
function; the layout()
function can also be used to position multiple display elements.
Inspecting ?par
shows how much can be modified within the limits of the base graphics system
We can manipulate the number of plots mfrow=
, axis label orientation las=
, line width lwd=
, point symbol pch=
and so on
opar <- par(mfrow=c(2,2))
boxplot(sci_mean ~ ST004D01T, a11, las=0)
boxplot(sci_mean ~ ST004D01T, a11, las=1)
boxplot(sci_mean ~ ST004D01T, a11, lwd=2)
boxplot(sci_mean ~ ST004D01T, a11, pch=4)
par(opar)
We can build up the plot bit-by-bit - here using the axes as an example
opar <- par(mfrow=c(2,2))
boxplot(sci_mean ~ ST004D01T, a11, axes=FALSE)
boxplot(sci_mean ~ ST004D01T, a11, axes=FALSE)
axis(2, las=1)
boxplot(sci_mean ~ ST004D01T, a11, axes=FALSE)
axis(2, las=1)
axis(1, at=1:2, labels=levels(a11$ST004D01T))
boxplot(sci_mean ~ ST004D01T, a11, axes=FALSE)
axis(2, las=1)
axis(1, at=1:2, labels=levels(a11$ST004D01T))
box()
par(opar)
We can manipulate the margins of the plot region within the figure region; layout()
helps us align the output
layout(cbind(c(1, 1), c(2,2)))
opar <- par(mar=c(3, 3, 0, 0)+0.1)
boxplot(sci_mean ~ ST004D01T, a11)
par(mar=c(10, 3, 4, 0)+0.1)
boxplot(sci_mean ~ ST004D01T, a11)
layout(1)
It is tricky to use plot region metrics in RStudioGD, so we’ll try the PNG device to look at device dimensions and to plot some logo images
library("jpeg")
im <- as.raster(readJPEG("image001.jpg"))
prop <- dim(im)[2]/dim(im)[1]
png("plot1.png")
par("din")
## [1] 6.666667 6.666667
plot(1, type="n", axes=FALSE, xlim=c(1, 10),
ylim=c(1, 10), asp=1, ann=FALSE)
rasterImage(im, 1:9, 1:9, (1:9) + prop, 2:10)
dev.off()
## png
## 2
The plot region and figure region are shown using R Graphics code
opar <- par(oma=rep(3, 4), bg="gray80")
plot(c(0, 1), c(0, 1), type="n", ann=FALSE,
axes=FALSE); box("outer", col="gray")
par(xpd=TRUE)
rect(-1, -1, 2, 2, col="gray90")
box("figure"); par(xpd=FALSE)
rect(-1, -1, 2, 2, col="gray80")
box("plot", lty="dashed")
text(.5, .5, "Plot Region")
mtext("Figure Region", side=3, line=2)
for (i in 1:4) mtext(paste("Outer margin", i),
side=i, line=1, outer=TRUE)
par(opar)
Most of base graphics is vector graphics, but some innovations apply both to base and grid
These include Raster Images in R Graphics, and complex paths covered in It’s Not What You Draw,It’s What You Don’t Draw
The gridBase package permits base graphics elements, often created as plot()
methods in contributed packages, to be placed in grid graphics displays
The plot.lm()
method in base package stats shows diagnostic plots, here indicating potential outliers; outlier detection is also provided in OutliersO3 and HDoutliers among others
opar <- par(mfrow=c(2,2))
plot(lm(math_mean ~ read_mean, data=a11), pch=".")
par(opar)