## Hot questions for Using Ggplot2 in ggproto

Question:

I'd like to create a split violin density plot using ggplot, like the fourth example on this page of the seaborn documentation.

Here is some data:

set.seed(20160229) my_data = data.frame( y=c(rnorm(1000), rnorm(1000, 0.5), rnorm(1000, 1), rnorm(1000, 1.5)), x=c(rep('a', 2000), rep('b', 2000)), m=c(rep('i', 1000), rep('j', 2000), rep('i', 1000)) )

I can plot dodged violins like this:

library('ggplot2') ggplot(my_data, aes(x, y, fill=m)) + geom_violin()

But it's hard to visually compare the widths at different points in the side-by-side distributions. I haven't been able to find any examples of split violins in ggplot - is it possible?

I found a base R graphics solution but the function is quite long and I want to highlight distribution modes, which are easy to add as additional layers in ggplot but will be harder to do if I need to figure out how to edit that function.

Answer:

Or, to avoid fiddling with the densities, you could extend `ggplot2`

's GeomViolin like this:

GeomSplitViolin <- ggproto("GeomSplitViolin", GeomViolin, draw_group = function(self, data, ..., draw_quantiles = NULL) { data <- transform(data, xminv = x - violinwidth * (x - xmin), xmaxv = x + violinwidth * (xmax - x)) grp <- data[1, "group"] newdata <- plyr::arrange(transform(data, x = if (grp %% 2 == 1) xminv else xmaxv), if (grp %% 2 == 1) y else -y) newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ], newdata[1, ]) newdata[c(1, nrow(newdata) - 1, nrow(newdata)), "x"] <- round(newdata[1, "x"]) if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) { stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1)) quantiles <- ggplot2:::create_quantile_segment_frame(data, draw_quantiles) aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), c("x", "y")), drop = FALSE] aesthetics$alpha <- rep(1, nrow(quantiles)) both <- cbind(quantiles, aesthetics) quantile_grob <- GeomPath$draw_panel(both, ...) ggplot2:::ggname("geom_split_violin", grid::grobTree(GeomPolygon$draw_panel(newdata, ...), quantile_grob)) } else { ggplot2:::ggname("geom_split_violin", GeomPolygon$draw_panel(newdata, ...)) } }) geom_split_violin <- function(mapping = NULL, data = NULL, stat = "ydensity", position = "identity", ..., draw_quantiles = NULL, trim = TRUE, scale = "area", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) { layer(data = data, mapping = mapping, stat = stat, geom = GeomSplitViolin, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles, na.rm = na.rm, ...)) }

And use the new `geom_split_violin`

like this:

ggplot(my_data, aes(x, y, fill = m)) + geom_split_violin()

Question:

Is it possible to display custom image (say png format) as geom_point in R ggplot?

library(png) pic1 <- readPNG("pic1.png") png("Heatmap.png", units="px", width=3200, height=3200, res=300) ggplot(data_frame, aes(medium, day, fill = Transactions)) + geom_tile(colour="white") + facet_grid(dime3_year~dime3_month) + scale_fill_gradient(high="blue",low="white") + theme_bw() + geom_point(aes(dime3_channel, day, size=Conv,alpha=Conv,image=(annotation_raster(pic1,xmin=0,ymin=0,xmax=5,ymax=5)),color="firebrick")) +

Gives error:

Don't know how to automatically pick scale for object of type proto/environment. Defaulting to continuous Error: Aesthetics must either be length one, or the same length as the dataProblems:(annotation_raster(conv_pic, xmin = 0, ymin = 0, xmax = 5, ymax = 5))

Answer:

The point geom is used to create scatterplots, and doesn't quite seem to be designed to do what you need, ie, display custom images. However, a similar question was answered here, which indicates that the problem can be solved in the following steps:

(1) Read the custom images you want to display,

(2) Render raster objects at the given location, size, and orientation using the `rasterGrob()`

function,

(3) Use a plotting function such as `qplot()`

,

(4) Use a geom such as `annotation_custom()`

for use as static annotations specifying the crude adjustments for x and y limits as mentioned by user20650.

Using the code below, I could get two custom images img1.png and img2.png positioned at the given xmin, xmax, ymin, and ymax.

library(png) library(ggplot2) library(gridGraphics) setwd("c:/MyFolder/") img1 <- readPNG("img1.png") img2 <- readPNG("img2.png") g1 <- rasterGrob(img1, interpolate=FALSE) g2 <- rasterGrob(img2, interpolate=FALSE) qplot(1:10, 1:10, geom="blank") + annotation_custom(g1, xmin=1, xmax=3, ymin=1, ymax=3) + annotation_custom(g2, xmin=7, xmax=9, ymin=7, ymax=9) + geom_point()

Question:

I have a faceted plot with very diverse data. So some facets have only 1 `x`

value, but some others have 13 `x`

values. I know there is the parameter `space='free'`

which adjusts the width of each facet by the data it represents.

My question, is there a possibility to adjust this space manually? Since some of my facets are so small, it is no longer possible to read the labels in the facets. I made a little reproducible example to show what I mean.

df <- data.frame(labelx=rep(c('my long label','short'), c(2,26)), labely=rep(c('a','b'), each=14), x=c(letters[1:2],letters[1:26]), y=LETTERS[6:7], i=rnorm(28)) ggplot(df, aes(x,y,color=i)) + geom_point() + facet_grid(labely~labelx, scales='free_x', space='free_x')

So depending on your screen, the `my long label`

facet gets compressed and you can no longer read the label.

I found a post on the internet which seems to do exactly what I want to do, but this seems to no longer work in `ggplot2`

. The post is from 2010.

https://kohske.wordpress.com/2010/12/25/adjusting-the-relative-space-of-a-facet-grid/

He suggests to use `facet_grid(fac1 + fac2 ~ fac3 + fac4, widths = 1:4, heights = 4:1)`

, so `widths`

and `heights`

to adjust each facet size manually.

Answer:

You can adjust the widths of a ggplot object using grid graphics

g = ggplot(df, aes(x,y,color=i)) + geom_point() + facet_grid(labely~labelx, scales='free_x', space='free_x') library(grid) gt = ggplot_gtable(ggplot_build(g)) gt$widths[4] = 4*gt$widths[4] grid.draw(gt)

With complex graphs with many elements, it can be slightly cumbersome to determine which width it is that you want to alter. In this instance it was grid column 4 that needed to be expanded, but this will vary for different plots. There are several ways to determine which one to change, but a fairly simple and good way is to use `gtable_show_layout`

from the `gtable`

package.

gtable_show_layout(gt)

produces the following image:

in which we can see that the left hand facet is in column number 4. The first 3 columns provide room for the margin, the axis title and the axis labels+ticks. Column 5 is the space between the facets, column 6 is the right hand facet. Columns 7 through 12 are for the right hand facet labels, spaces, the legend, and the right margin.

An alternative to inspecting a graphical representation of the gtable is to simply inspect the table itself. In fact if you need to automate the process, this would be the way to do it. So lets have a look at the TableGrob:

gt # TableGrob (13 x 12) "layout": 25 grobs # z cells name grob # 1 0 ( 1-13, 1-12) background rect[plot.background..rect.399] # 2 1 ( 7- 7, 4- 4) panel-1-1 gTree[panel-1.gTree.283] # 3 1 ( 9- 9, 4- 4) panel-2-1 gTree[panel-3.gTree.305] # 4 1 ( 7- 7, 6- 6) panel-1-2 gTree[panel-2.gTree.294] # 5 1 ( 9- 9, 6- 6) panel-2-2 gTree[panel-4.gTree.316] # 6 3 ( 5- 5, 4- 4) axis-t-1 zeroGrob[NULL] # 7 3 ( 5- 5, 6- 6) axis-t-2 zeroGrob[NULL] # 8 3 (10-10, 4- 4) axis-b-1 absoluteGrob[GRID.absoluteGrob.329] # 9 3 (10-10, 6- 6) axis-b-2 absoluteGrob[GRID.absoluteGrob.336] # 10 3 ( 7- 7, 3- 3) axis-l-1 absoluteGrob[GRID.absoluteGrob.343] # 11 3 ( 9- 9, 3- 3) axis-l-2 absoluteGrob[GRID.absoluteGrob.350] # 12 3 ( 7- 7, 8- 8) axis-r-1 zeroGrob[NULL] # 13 3 ( 9- 9, 8- 8) axis-r-2 zeroGrob[NULL] # 14 2 ( 6- 6, 4- 4) strip-t-1 gtable[strip] # 15 2 ( 6- 6, 6- 6) strip-t-2 gtable[strip] # 16 2 ( 7- 7, 7- 7) strip-r-1 gtable[strip] # 17 2 ( 9- 9, 7- 7) strip-r-2 gtable[strip] # 18 4 ( 4- 4, 4- 6) xlab-t zeroGrob[NULL] # 19 5 (11-11, 4- 6) xlab-b titleGrob[axis.title.x..titleGrob.319] # 20 6 ( 7- 9, 2- 2) ylab-l titleGrob[axis.title.y..titleGrob.322] # 21 7 ( 7- 9, 9- 9) ylab-r zeroGrob[NULL] # 22 8 ( 7- 9,11-11) guide-box gtable[guide-box] # 23 9 ( 3- 3, 4- 6) subtitle zeroGrob[plot.subtitle..zeroGrob.396] # 24 10 ( 2- 2, 4- 6) title zeroGrob[plot.title..zeroGrob.395] # 25 11 (12-12, 4- 6) caption zeroGrob[plot.caption..zeroGrob.397]

The relevant bits are

# cells name # ( 7- 7, 4- 4) panel-1-1 # ( 9- 9, 4- 4) panel-2-1 # ( 6- 6, 4- 4) strip-t-1

in which the names panel-x-y refer to panels in x, y coordinates, and the cells give the coordinates (as ranges) of that named panel in the table. So, for example, the top and bottom left-hand panels both are located in table cells with the column ranges `4- 4`

. (only in column four, that is). The left-hand top strip is also in cell column 4.

If you wanted to use this table to find the relevant width programmatically, rather than manually, (using the top left facet, ie `"panel-1-1"`

as an example) you could use

gt$layout$l[grep('panel-1-1', gt$layout$name)] # [1] 4

Question:

I am having a little trouble with my radar chart in R. Even though the plot is fine I am getting the following warning:

> source('~/.active-rstudio-document') Warning message: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : duplicated levels in factors are deprecated > radar Warning messages: 1: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : duplicated levels in factors are deprecated 2: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : duplicated levels in factors are deprecated

I've seen the same error in other posts but I didn't really understand how to apply the answers to my dataset ...

This is my dataset

MSF,C1,2 OCA,C1,6 SIOA,C1,4 CCFF,C1,4 MSF,C2,4 OCA,C2,2 SIOA,C2,6 CCFF,C2,2 MSF,C3,6 OCA,C3,6 SIOA,C3,6 CCFF,C3,6

And this is the code for the corresponding radar chart (probably only the first part where I define my dataset is relevant but yeah ... that's where I am lost):

colnames(dataset) = c("type", "variable", "value") dataset$value = as.numeric(dataset$value) dataset$variable <- factor(dataset$variable, levels = rev(dataset$variable), ordered=TRUE) # Radar function ------------------------------------------------------------ coord_radar <- function (theta = "x", start = 0, direction = 1) { theta <- match.arg(theta, c("x", "y")) r <- if (theta == "x") "y" else "x" ggproto("CordRadar", CoordPolar, theta = theta, r = r, start = start, direction = sign(direction), is_linear = function(coord) TRUE) } # Radar plot ------------------------------------------------------------ radar <- ggplot(dataset, aes(x = variable, y = value, group=type)) + geom_polygon(aes(group = type, color=type,fill=type), size = 1, alpha=0.1) + scale_fill_manual(values=cbPalette) + geom_line(aes(group = type, color=type)) + scale_colour_manual(values = cbPalette) + coord_radar()

Answer:

Yes, almost all of that is irrelevant to your problem.

You are trying to create a factor with the following levels: `rev(dataset$variable)`

. That yields:

[1] C3 C3 C3 C3 C2 C2 C2 C2 C1 C1 C1

See how you have replicated levels? You'll want to have each level only once, in the order that you want. The default is `sort(unique(dataset$variable))`

, which gives `C1 C2 C3`

, or you could use `rev(unique(dataset$variable)`

to give `C3 C2 C1`

.

The `forcats`

package has several convenience functions to easily make or change factors and the order of their levels.

Question:

This is actually two questions in one (not sure if goes against SO rules, but anyway).

First question is how can I force a `geom_text`

to fit within a `geom_bar`

? (dynamically according to the values plotted)

Looking around, the solutions I found were changing the size of the label. That certainly works, but not for every case. You can change the size for a specific plot to make the text fit within the bar, but when the data changes, you may need to manually change the size of the text again. My real-life problem is that I need to generate the same plot for constantly changing data (daily), so I cannot really manually adjust the size for each plot.

I tried setting the size of the label as a function of the data. It kinda works, not perfectly, but works for many cases.

But here's another problem, even when the label fits within the bar, resizing the plot messes everything up. Looking into it, I also found in the ggplot documentation that

labels do have height and width, but they are physical units, not data units. The amount of space they occupy on that plot is not constant in data units: when you resize a plot, labels stay the same size, but the size of the axes changes.

Which takes me to my second question: Is it possible to change this default behaviour and let/make the labels resize with the plot?

And also let me refine my first question. Is it possible to force a `geom_text`

to fit within a `geom_bar`

, dynamically setting the size of the text using a clever relation between physical units and data units?

So, to follow good practice, here's my reproducible example:

set.seed(1234567) data_gd <- data.frame(x = letters[1:5], y = runif(5, 100, 99999)) ggplot(data = data_gd, mapping = aes(x = x, y = y, fill = x)) + geom_bar(stat = "identity") + geom_text(mapping = aes(label = y, y = y/2))

This code produces this plot:

If I simply resize the plot, "labels stay the same size, but the size of the axes changes" thereby making the labels fit into the bars (now perhaps labels are even too small).

So, this is my second question. It would be nice that the labels resize as well and keep the aspect ration in relation to the bars. Any ideas how to accomplish this or if it is possible at all?

Ok, but going back to how to fit the labels within the bars, the simplest solution is to set the size of the labels.

ggplot(data = data_gd, mapping = aes(x = x, y = y, fill = x)) + geom_bar(stat = "identity") + geom_text(mapping = aes(label = y, y = y/2), size = 3)

Again, this works as shown below, but it is not maintainable /nor robust to changes in the data.

For example, the very same code to generate the plot with different data yields catastrophic results.

data_gd <- data.frame(x = letters[1:30], y = runif(30, 100, 99999)) ggplot(data = data_gd, mapping = aes(x = x, y = y, fill = x)) + geom_bar(stat = "identity") + geom_text(mapping = aes(label = y, y = y/2), size = 3)

And I can go on with the examples, setting the size of the labels as a function of the number of categories on x-axis and so on. But you get the point, and perhaps one of you `ggplot2`

experts can give me ideas.

Answer:

one option might be to write a geom that uses a textGrob with a custom drawDetails method to fit within the allocated space, set by the bar width.

library(grid) library(ggplot2) fitGrob <- function(label, x=0.5, y=0.5, width=1){ grob(x=x, y=y, width=width, label=label, cl = "fit") } drawDetails.fit <- function(x, recording=FALSE){ tw <- sapply(x$label, function(l) convertWidth(grobWidth(textGrob(l)), "native", valueOnly = TRUE)) cex <- x$width / tw grid.text(x$label, x$x, x$y, gp=gpar(cex=cex), default.units = "native") } `%||%` <- ggplot2:::`%||%` GeomFit <- ggproto("GeomFit", GeomRect, required_aes = c("x", "label"), setup_data = function(data, params) { data$width <- data$width %||% params$width %||% (resolution(data$x, FALSE) * 0.9) transform(data, ymin = pmin(y, 0), ymax = pmax(y, 0), xmin = x - width / 2, xmax = x + width / 2, width = NULL ) }, draw_panel = function(self, data, panel_scales, coord, width = NULL) { bars <- ggproto_parent(GeomRect, self)$draw_panel(data, panel_scales, coord) coords <- coord$transform(data, panel_scales) width <- abs(coords$xmax - coords$xmin) tg <- fitGrob(label=coords$label, y = coords$y/2, x = coords$x, width = width) grobTree(bars, tg) } ) geom_fit <- function(mapping = NULL, data = NULL, stat = "count", position = "stack", ..., width = NULL, binwidth = NULL, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) { layer( data = data, mapping = mapping, stat = stat, geom = GeomFit, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list( width = width, na.rm = na.rm, ... ) ) } set.seed(1234567) data_gd <- data.frame(x = letters[1:5], y = runif(5, 100, 99999)) ggplot(data = data_gd, mapping = aes(x = x, y = y, fill = x, label=round(y))) + geom_fit(stat = "identity") + theme()

Question:

Is there a way to get the arrowhead closed on geom_curve? The same code works with geom_segment. Maybe its a bug?

library(tidyverse) set.seed(123) data <- data_frame(x = rnorm(10), y = rnorm(10)) # NO ARROWHEAD FILL ggplot(data, aes(x, y)) + geom_point() + geom_curve(aes(x = 0, y = 0, xend = 1, yend = 1), color = "black", arrow = arrow(type = "closed")) # ARROWHEAD FILL WORKS ggplot(data, aes(x, y)) + geom_point() + geom_segment(aes(x = 0, y = 0, xend = 1, yend = 1), color = "black", arrow = arrow(type = "closed"))

Answer:

I'd call it a bug and you should file an issue. Until then:

geom_curve2 <- function(mapping = NULL, data = NULL, stat = "identity", position = "identity", ..., curvature = 0.5, angle = 90, ncp = 5, arrow = NULL, lineend = "butt", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) { layer( data = data, mapping = mapping, stat = stat, geom = GeomCurve2, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list( arrow = arrow, curvature = curvature, angle = angle, ncp = ncp, lineend = lineend, na.rm = na.rm, ... ) ) } GeomCurve2 <- ggproto("GeomCurve2", GeomSegment, draw_panel = function(data, panel_params, coord, curvature = 0.5, angle = 90, ncp = 5, arrow = NULL, lineend = "butt", na.rm = FALSE) { if (!coord$is_linear()) { warning("geom_curve is not implemented for non-linear coordinates", call. = FALSE) } trans <- coord$transform(data, panel_params) curveGrob( trans$x, trans$y, trans$xend, trans$yend, default.units = "native", curvature = curvature, angle = angle, ncp = ncp, square = FALSE, squareShape = 1, inflect = FALSE, open = TRUE, gp = gpar( col = alpha(trans$colour, trans$alpha), fill = alpha(trans$colour, trans$alpha), lwd = trans$size * .pt, lty = trans$linetype, lineend = lineend), arrow = arrow ) } )

Which leads to:

ggplot(data, aes(x, y)) + geom_point() + geom_curve2(aes(x = 0, y = 0, xend = 1, yend = 1), color = "black", arrow = arrow(type = "closed"))

and

Question:

I've been reading the vignette on extending ggplot2, but I'm a bit stuck on how I can make a single geom that can add multiple geometries to the plot. Multiple geometries already exist in ggplot2 geoms, for example, we have things like `geom_contour`

(multiple paths), and `geom_boxplot`

(multiple paths and points). But I can't quite see how to extend those into new geoms.

Let's say I'm trying to make a `geom_manythings`

that will draw two polygons and one point by computing on a single dataset. One polygon will be a convex hull for all the points, the second polygon will be a convex hull for a subset of the points, and the single point will represent the centre of the data. I want all of these to appear with a call to one geom, rather than three separate calls, as we see here:

# example data set set.seed(9) n <- 1000 x <- data.frame(x = rnorm(n), y = rnorm(n)) # computations for the geometries # chull for all the points hull <- x[chull(x),] # chull for all a subset of the points subset_of_x <- x[x$x > 0 & x$y > 0 , ] hull_of_subset <- subset_of_x[chull(subset_of_x), ] # a point in the centre of the data centre_point <- data.frame(x = mean(x$x), y = mean(x$y)) # plot library(ggplot2) ggplot(x, aes(x, y)) + geom_point() + geom_polygon(data = x[chull(x),], alpha = 0.1) + geom_polygon(data = hull_of_subset, alpha = 0.3) + geom_point(data = centre_point, colour = "green", size = 3)

I want to have a `geom_manythings`

to replace the three `geom_*`

in the code above.

In an attempt to make a custom geom, I started with code in `geom_tufteboxplot`

and `geom_boxplot`

as templates, along with the 'extending ggplot2' vignette:

library(ggplot2) library(proto) GeomManythings <- ggproto( "GeomManythings", GeomPolygon, setup_data = function(self, data, params) { data <- ggproto_parent(GeomPolygon, self)$setup_data(data, params) data }, draw_group = function(data, panel_scales, coord) { n <- nrow(data) if (n <= 2) return(grid::nullGrob()) common <- data.frame( colour = data$colour, size = data$size, linetype = data$linetype, fill = alpha(data$fill, data$alpha), group = data$group, stringsAsFactors = FALSE ) # custom bits... # polygon hull for all points hull <- data[chull(data), ] hull_df <- data.frame(x = hull$x, y = hull$y, common, stringsAsFactors = FALSE) hull_grob <- GeomPolygon$draw_panel(hull_df, panel_scales, coord) # polygon hull for subset subset_of_x <- data[data$x > 0 & data$y > 0 ,] hull_of_subset <- subset_of_x[chull(subset_of_x),] hull_of_subset_df <- data.frame(x = hull_of_subset$x, y = hull_of_subset$y, common, stringsAsFactors = FALSE) hull_of_subset_grob <- GeomPolygon$draw_panel(hull_of_subset_df, panel_scales, coord) # point for centre point centre_point <- data.frame(x = mean(coords$x), y = coords(data$y), common, stringsAsFactors = FALSE) centre_point_grob <- GeomPoint$draw_panel(centre_point, panel_scales, coord) # end of custom bits ggname("geom_mypolygon", grobTree(hull_grob, hull_of_subset_grob, centre_point_grob)) }, required_aes = c("x", "y"), draw_key = draw_key_polygon, default_aes = aes( colour = "grey20", fill = "grey20", size = 0.5, linetype = 1, alpha = 1, ) ) geom_manythings <- function(mapping = NULL, data = NULL, stat = "identity", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...) { layer( geom = GeomManythings, mapping = mapping, data = data, stat = stat, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(na.rm = na.rm, ...) ) }

But clearly there are quite a few things not right in this geom, I must be missing some fundamental details...

ggplot(x, aes(x, y)) + geom_point() + geom_manythings()

How can I write this geom to get the desired result?

Answer:

there are quite a few issues in your code, so I suggest you try with a simplified case first. In particular, the chull calculation was problematic. Try this,

library(ggplot2) library(proto) library(grid) GeomManythings <- ggproto( "GeomManythings", Geom, setup_data = function(self, data, params) { data <- ggproto_parent(Geom, self)$setup_data(data, params) data }, draw_group = function(data, panel_scales, coord) { n <- nrow(data) if (n <= 2) return(grid::nullGrob()) # polygon hull for all points hull_df <- data[chull(data[,c("x", "y")]), ] hull_grob <- GeomPolygon$draw_panel(hull_df, panel_scales, coord) # polygon hull for subset subset_of_x <- data[data$x > 0 & data$y > 0 ,] hull_of_subset_df <-subset_of_x[chull(subset_of_x[,c("x", "y")]),] hull_of_subset_df$fill <- "red" # testing hull_of_subset_grob <- GeomPolygon$draw_panel(hull_of_subset_df, panel_scales, coord) coords <- coord$transform(data, panel_scales) pg <- pointsGrob(x=mean(coords$x), y=mean(coords$y), default.units = "npc", gp=gpar(col="green", cex=3)) ggplot2:::ggname("geom_mypolygon", grobTree(hull_grob, hull_of_subset_grob, pg)) }, required_aes = c("x", "y"), draw_key = draw_key_polygon, default_aes = aes( colour = "grey20", fill = "grey50", size = 0.5, linetype = 1, alpha = 0.5 ) ) geom_manythings <- function(mapping = NULL, data = NULL, stat = "identity", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...) { layer( geom = GeomManythings, mapping = mapping, data = data, stat = stat, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(na.rm = na.rm, ...) ) } set.seed(9) n <- 20 d <- data.frame(x = rnorm(n), y = rnorm(n)) ggplot(d, aes(x, y)) + geom_manythings()+ geom_point()

(disclaimer: I haven't tried to write a geom in 5 years, so I don't know how it works nowadays)

Question:

I would like to save some specifications of a ggplot command for later, given that I need to run several different graphs that all share some scale aesthetics.

Let's say I would like save this for later:

my.scale_aes <- scale_x_continuous(...) + scale_color_manual(...)

This would of course prompt an error message, indicating that you cannot add ggproto objects together without a direct `ggplot()`

call. But is that really the case? And is there another way by which I could still add these components together?

I read somewhere else that it has to do with the different methods of adding elements together: `methods("+")`

and that what I need has something to do with `+.gg*`

but I have no idea how to implement this and how to make it work.

Answer:

You can do this by defining a list of the `ggplot`

terms you want and adding them in.

library(ggplot2) my.scale_aes <- list( scale_x_continuous(breaks = c(56, 60, 61)), scale_color_manual(values = c("black", "red")) ) ggplot(data = diamonds[1:100,], aes(depth, price, color = cut == "Ideal")) + geom_point() + my.scale_aes

Question:

I know that in `ggplot2`

one can add the convex hull to a scatterplot by group as in

library(ggplot2) library(plyr) data(iris) df<-iris find_hull <- function(df) df[chull(df$Sepal.Length, df$Sepal.Width), ] hulls <- ddply(df, "Species", find_hull) plot <- ggplot(data = df, aes(x = Sepal.Length, y = Sepal.Width, colour=Species, fill = Species)) + geom_point() + geom_polygon(data = hulls, alpha = 0.5) + labs(x = "Sepal.Length", y = "Sepal.Width") plot

I was wondering though how one could calculate and add alpha bags instead, i.e. the largest convex hull that contains at least a proportion 1-alpha of all the points? Either in 2d (to display with ggplot2) or 3d (to display with rgl).

EDIT: My initial idea was be to keep on "peeling" the convex hull for as along as the criterion of containing at least a given % of points would be satisfied, although in the paper here it seems they use a different algorithm (isodepth, which seems to be implemented in R package depth, in function isodepth and aplpack::plothulls seems also close to what I want (although it produces a full plot as opposed to just the contour), so I think with these I may be sorted. Though these function only works in 2D, and I would also be interested in a 3D extension (to be plotted in rgl). If anyone has any pointers let me know!

EDIT2: with function `depth::isodepth`

I found a 2d solution (see post below), although I am still looking for a 3D solution as well - if anyone would happen to know how to do that, please let me know!

Answer:

Ha with the help of function `depth::isodepth`

I came up with the following solution - here I find the alpha bag that contains a proportion of at least 1-alpha of all points :

library(mgcv) library(depth) library(plyr) library(ggplot2) data(iris) df=iris[,c(1,2,5)] alph=0.05 find_bag = function(x,alpha=alph) { n=nrow(x) target=1-alpha propinside=1 d=1 while (propinside>target) { p=isodepth(x[,1:2],dpth=d,output=T, mustdith=T)[[1]] ninside=sum(in.out(p,as.matrix(x[,1:2],ncol=2))*1) nonedge=sum(sapply(1:nrow(p),function (row) nrow(merge(round(setNames(as.data.frame(p[row,,drop=F]),names(x)[1:2]),5),as.data.frame(x[,1:2])))>0)*1)-3 propinside=(ninside+nonedge)/n d=d+1 } p=isodepth(x[,1:2],dpth=d-1,output=T, mustdith=T)[[1]] p } bags <- ddply(df, "Species", find_bag,alpha=alph) names(bags) <- c("Species",names(df)[1:2]) plot <- ggplot(data = df, aes(x = Sepal.Length, y = Sepal.Width, colour=Species, fill = Species)) + geom_point() + geom_polygon(data = bags, alpha = 0.5) + labs(x = "Sepal.Length", y = "Sepal.Width") plot

EDIT2: Using my original idea of convex hull peeling I also came up with the following solution which now works in 2d & 3d; the result is not quite the same is with the isodepth algorithm, but it's pretty close :

# in 2d library(plyr) library(ggplot2) data(iris) df=iris[,c(1,2,5)] alph=0.05 find_bag = function(x,alpha=alph) { n=nrow(x) propinside=1 target=1-alpha x2=x while (propinside>target) { propinside=nrow(x2)/n hull=chull(x2) x2old=x2 x2=x2[-hull,] } x2old[chull(x2old),] } bags <- ddply(df, "Species", find_bag, alpha=alph) plot <- ggplot(data = df, aes(x = Sepal.Length, y = Sepal.Width, colour=Species, fill = Species)) + geom_point() + geom_polygon(data = bags, alpha = 0.5) + labs(x = "Sepal.Length", y = "Sepal.Width") plot

# in 3d library(plyr) library(ggplot2) data(iris) df=iris[,c(1,2,3,5)] levels=unique(df[,"Species"]) nlevels=length(levels) zoom=0.8 cex=1 aspectr=c(1,1,0.7) pointsalpha=1 userMatrix=matrix(c(0.80,-0.60,0.022,0,0.23,0.34,0.91,0,-0.55,-0.72,0.41,0,0,0,0,1),ncol=4,byrow=T) windowRect=c(0,29,1920,1032) cols=c("red","forestgreen","blue") alph=0.05 plotbag = function(x,alpha=alph,grp=1,cols=c("red","forestgreen","blue"),transp=0.2) { propinside=1 target=1-alpha x2=x levels=unique(x2[,ncol(x2)]) x2=x2[x2[,ncol(x2)]==levels[[grp]],] n=nrow(x2) while (propinside>target) { propinside=nrow(x2)/n hull=unique(as.vector(convhulln(as.matrix(x2[,1:3]), options = "Tv"))) x2old=x2 x2=x2[-hull,] } ids=t(convhulln(as.matrix(x2old[,1:3]), options = "Tv")) rgl.triangles(x2old[ids,1],x2old[ids,2],x2old[ids,3],col=cols[[grp]],alpha=transp,shininess=50) } open3d(zoom=zoom,userMatrix=userMatrix,windowRect=windowRect,antialias=8) for (i in 1:nlevels) { plot3d(x=df[df[,ncol(df)]==levels[[i]],][,1], y=df[df[,ncol(df)]==levels[[i]],][,2], z=df[df[,ncol(df)]==levels[[i]],][,3], type="s", col=cols[[i]], size=cex, lit=TRUE, alpha=pointsalpha,point_antialias=TRUE, line_antialias=TRUE,shininess=50, add=TRUE) plotbag(df,alpha=alph, grp=i, cols=c("red","forestgreen","blue"), transp=0.3) } axes3d(color="black",drawfront=T,box=T,alpha=1) title3d(color="black",xlab=names(df)[[1]],ylab=names(df)[[2]],zlab=names(df)[[3]],alpha=1) aspect3d(aspectr)

Question:

I'm often using boxplots in my work and like `ggplot2`

aesthetics. But standard `geom_boxplot`

lacks two things important for me: ends of whiskers and median labels. Thanks to information from here I've written a function:

gBoxplot <- function(formula = NULL, data = NULL, font = "CMU Serif", fsize = 18){ require(ggplot2) vars <- all.vars(formula) response <- vars[1] factor <- vars[2] # A function for medians labelling fun_med <- function(x){ return(data.frame(y = median(x), label = round(median(x), 3))) } p <- ggplot(data, aes_string(x = factor, y = response)) + stat_boxplot(geom = "errorbar", width = 0.6) + geom_boxplot() + stat_summary(fun.data = fun_med, geom = "label", family = font, size = fsize/3, vjust = -0.1) + theme_grey(base_size = fsize, base_family = font) return(p) }

There are also font settings, but this is just because I'm too lazy to make a theme. Here is an example:

gBoxplot(hwy ~ class, mpg)

Good enough for me, but there are some restrictictions (cannot use auto-dodging, etc.), and it will be better to make a new geom based on `geom_boxplot`

. I've read the vignette Extending ggplot2, but cannot understand how to implement it.
Any help will be appreciated.

Answer:

So been thinking about this one for a while. Basically when you create a new primitive, you normally write a combination of:

- A
**layer-function** - A
**stat-ggproto**, - A
**geom-ggproto**

Only the **layer-function** need be visible to the user. You only need to write a **stat-ggproto** if you need some new way of transforming your data to make your primitive. And you only need write a **geom-ggproto** if you have some new grid-based graphics to create.

In this case, where we are basically composting **layer-function** that already exist, we don’t really need to write new ggprotos. It is enough to write a new **layer-function**. This **layer-function** will create the three layers that you already are using and map the parameters the way you intend. In this case:

- Layer1 – uses
`geom_errorbar`

and`stat_boxplot`

– to get our errorbars - Layer2 – uses
`geom_boxplot`

and`stat_boxplot`

- to create the boxplots - Layer3 – users
`geom_label`

and`stat_summary`

- to create the text labels with the mean value in the center of the boxes.

Of course you could write a new **stat-ggproto** and a new **geom-ggproto** that do all of these things at once. Or maybe you compost `stat_summary`

and `stat_boxplot`

into one, and the three **geom-protos** as well, and this do this with one layer. But there is little point unless we have efficiency problems.

Anyway, here is the code:

geom_myboxplot <- function(formula = NULL, data = NULL, stat = "boxplot", position = "dodge",coef=1.5, font = "sans", fsize = 18, width=0.6, fun.data = NULL, fun.y = NULL, fun.ymax = NULL, fun.ymin = NULL, fun.args = list(), outlier.colour = NULL, outlier.color = NULL, outlier.shape = 19, outlier.size = 1.5,outlier.stroke = 0.5, notch = FALSE, notchwidth = 0.5,varwidth = FALSE, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE,...) { vars <- all.vars(formula) response <- vars[1] factor <- vars[2] mymap <- aes_string(x=factor,y=response) fun_med <- function(x) { return(data.frame(y = median(x), label = round(median(x), 3))) } position <- position_dodge(width) l1 <- layer(data = data, mapping = mymap, stat = StatBoxplot, geom = "errorbar", position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(na.rm = na.rm, coef = coef, width = width, ...)) l2 <- layer(data = data, mapping = mymap, stat = stat, geom = GeomBoxplot, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(outlier.colour = outlier.colour, outlier.shape = outlier.shape, outlier.size = outlier.size, outlier.stroke = outlier.stroke, notch = notch, notchwidth = notchwidth, varwidth = varwidth, na.rm = na.rm, ...)) l3 <- layer(data = data, mapping = mymap, stat = StatSummary, geom = "label", position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(fun.data = fun_med, fun.y = fun.y, fun.ymax = fun.ymax, fun.ymin = fun.ymin, fun.args = fun.args, na.rm=na.rm,family=font,size=fsize/3,vjust=-0.1,...)) return(list(l1,l2,l3)) }

which allows you to create your customized boxplots it now like this:

ggplot(mpg) + geom_myboxplot( hwy ~ class, font = "sans",fsize = 18)+ theme_grey(base_family = "sans",base_size = 18 )

And they look like this:

**Note**: we did not actually have to use the `layer`

function, we could have used the orginal `stat_boxplot`

, `geom_boxplot`

, and `stat_summary`

calls in their place. But we still would have had to fill in all the parameters if we wanted to be able to control them from our custom boxplot, so I think it was clearer this way - at least from the point-of-view of structure as opposed to functionality. Maybe it isn't though, it is a matter of taste...

Also I don't have that font which does look a lot nicer. But I did not feel like tracking it down and installing it.

Question:

In order to plot half densities, I am using the function described in this post: Split violin plot with ggplot2

However, when I want to draw the quantiles on the densities, like on a normal `geom_violin()`

or `geom_boxplot()`

, I obtain an error message.

I would also be interested in adding the number of observations above each half density.

Here is an example of what I would like to obtain:

data("diamonds") library(ggplot2) # Function described in a previous post GeomSplitViolin <- ggproto("GeomSplitViolin", GeomViolin, draw_group = function(self, data, ..., draw_quantiles = NULL){ data <- transform(data, xminv = x - violinwidth * (x - xmin), xmaxv = x + violinwidth * (xmax - x)) grp <- data[1,'group'] newdata <- plyr::arrange(transform(data, x = if(grp%%2==1) xminv else xmaxv), if(grp%%2==1) y else -y) newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ], newdata[1, ]) newdata[c(1,nrow(newdata)-1,nrow(newdata)), 'x'] <- round(newdata[1, 'x']) if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) { stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1)) quantiles <- create_quantile_segment_frame(data, draw_quantiles) aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), c("x", "y")), drop = FALSE] aesthetics$alpha <- rep(1, nrow(quantiles)) both <- cbind(quantiles, aesthetics) quantile_grob <- GeomPath$draw_panel(both, ...) ggplot2:::ggname("geom_split_violin", grobTree(GeomPolygon$draw_panel(newdata, ...), quantile_grob)) } else { ggplot2:::ggname("geom_split_violin", GeomPolygon$draw_panel(newdata, ...)) } }) geom_split_violin <- function (mapping = NULL, data = NULL, stat = "ydensity", position = "identity", ..., draw_quantiles = NULL, trim = TRUE, scale = "area", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) { layer(data = data, mapping = mapping, stat = stat, geom = GeomSplitViolin, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles, na.rm = na.rm, ...)) } tmp <- diamonds[which(diamonds$cut %in% c("Fair", "Good")), ] # Obtained plot ggplot(tmp, aes(as.factor(color), carat, fill = cut)) + geom_split_violin() # Error due to internal functions (interleave, ...) ggplot(tmp, aes(as.factor(color), carat, fill = cut)) + geom_split_violin(draw_quantiles = 0.5) # Function to return number of observation give_n = function(x, y_up = y_upper) { data.frame(y = y_up * 1.06, label = paste("n =", length(x)) ) } # Code to add number of observations above each half density new_plot = given_plot + # Give back only length of data stat_summary(fun.data = give_n, aes(x = as.factor(variable)), geom = "text")

Answer:

We can make further adjustments to the function by @YAK, and add some adjustments to `create_quantile_segment_frame`

:

GeomSplitViolin <- ggproto("GeomSplitViolin", GeomViolin, draw_group = function(self, data, ..., draw_quantiles = NULL) { # Original function by Jan Gleixner (@jan-glx) # Adjustments by Wouter van der Bijl (@Axeman) data <- transform(data, xminv = x - violinwidth * (x - xmin), xmaxv = x + violinwidth * (xmax - x)) grp <- data[1, "group"] newdata <- plyr::arrange(transform(data, x = if (grp %% 2 == 1) xminv else xmaxv), if (grp %% 2 == 1) y else -y) newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ], newdata[1, ]) newdata[c(1, nrow(newdata) - 1, nrow(newdata)), "x"] <- round(newdata[1, "x"]) if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) { stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1)) quantiles <- create_quantile_segment_frame(data, draw_quantiles, split = TRUE, grp = grp) aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), c("x", "y")), drop = FALSE] aesthetics$alpha <- rep(1, nrow(quantiles)) both <- cbind(quantiles, aesthetics) quantile_grob <- GeomPath$draw_panel(both, ...) ggplot2:::ggname("geom_split_violin", grid::grobTree(GeomPolygon$draw_panel(newdata, ...), quantile_grob)) } else { ggplot2:::ggname("geom_split_violin", GeomPolygon$draw_panel(newdata, ...)) } } ) create_quantile_segment_frame <- function(data, draw_quantiles, split = FALSE, grp = NULL) { dens <- cumsum(data$density) / sum(data$density) ecdf <- stats::approxfun(dens, data$y) ys <- ecdf(draw_quantiles) violin.xminvs <- (stats::approxfun(data$y, data$xminv))(ys) violin.xmaxvs <- (stats::approxfun(data$y, data$xmaxv))(ys) violin.xs <- (stats::approxfun(data$y, data$x))(ys) if (grp %% 2 == 0) { data.frame( x = ggplot2:::interleave(violin.xs, violin.xmaxvs), y = rep(ys, each = 2), group = rep(ys, each = 2) ) } else { data.frame( x = ggplot2:::interleave(violin.xminvs, violin.xs), y = rep(ys, each = 2), group = rep(ys, each = 2) ) } } geom_split_violin <- function(mapping = NULL, data = NULL, stat = "ydensity", position = "identity", ..., draw_quantiles = NULL, trim = TRUE, scale = "area", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) { layer(data = data, mapping = mapping, stat = stat, geom = GeomSplitViolin, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles, na.rm = na.rm, ...)) }

Then simply plot:

ggplot(diamonds[which(diamonds$cut %in% c("Fair", "Good")), ], aes(as.factor(color), carat, fill = cut)) + geom_split_violin(draw_quantiles = c(0.25, 0.5, 0.75))

Question:

I am following up a discussion started at: How can I make geom_area() leave a gap for missing values?. It seems like geom_ribbon is not longer leaving gaps for missing values. Please try to execute the reproducible example in the attached link. I cannot get the answer described. Can you?

Answer:

Looks like a bug in `ggplot2`

, there seems to be a missing `handle_na`

function that needs to be added as a part of a new unified way of dealing with NA values.

##### Update:

The first post here refined an entire new `ggproto`

to fix this, but I realized
that as a one-liner workaround you can just override the `handle_na`

function like I do in the code below (`# fix GeomRibbon`

):

require(dplyr) require(ggplot2) require(grid) set.seed(1) test <- data.frame(x = rep(1:10, 3), y = abs(rnorm(30)), z = rep(LETTERS[1:3], 10)) %>% arrange(x, z) test[test$x == 4, "y"] <- NA test$ymax <- test$y test$ymin <- 0 zl <- levels(test$z) for (i in 2:length(zl)) { zi <- test$z == zl[i] zi_1 <- test$z == zl[i - 1] test$ymin[zi] <- test$ymax[zi_1] test$ymax[zi] <- test$ymin[zi] + test$ymax[zi] } # fix GeomRibbon GeomRibbon$handle_na <- function(data, params) { data } ggplot(test, aes(x = x,y=y, ymax = ymax, ymin = ymin, fill = z)) + geom_ribbon() + scale_x_continuous(breaks = 1:10)

yielding:

Question:

I'm trying to write a custom `stat_*`

for `ggplot2`

, where I would like to color a 2D loess surface using tiles. When I start from the extension guide, I can write a stat_chull like they do:

stat_chull = function(mapping = NULL, data = NULL, geom = "polygon", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...) { chull = ggproto("chull", Stat, compute_group = function(data, scales) { data[chull(data$x, data$y), , drop = FALSE] }, required_aes = c("x", "y") ) layer( stat = chull, data = data, mapping = mapping, geom = geom, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(na.rm = na.rm, ...) ) }

This work for both the simple call and facet wrapping:

ggplot(mpg, aes(x=displ, y=hwy)) + geom_point() + stat_chull() # optionally + facet_wrap(~ class)

When I write my `stat_loess2d`

, I can also visualize all classes or an individual class:

stat_loess2d = function(mapping = NULL, data = NULL, geom = "tile", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...) { loess2d = ggproto("loess2d", Stat, compute_group = function(data, scales) { dens = MASS::kde2d(data$x, data$y) lsurf = loess(fill ~ x + y, data=data) df = data.frame(x = rep(dens$x, length(dens$y)), y = rep(dens$y, each=length(dens$x)), dens = c(dens$z)) df$fill = predict(lsurf, newdata=df[c("x", "y")]) df }, required_aes = c("x", "y", "fill") ) layer( stat = loess2d, data = data, mapping = mapping, geom = geom, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(na.rm = na.rm, ...) ) } ggplot(mpg, aes(x=displ, y=hwy, fill=year)) + geom_point(aes(color=year)) + stat_loess2d() ggplot(mpg[mpg$class == "compact",], aes(x=displ, y=hwy, fill=year)) + geom_point(aes(color=year)) + stat_loess2d()

However, when I try to facet the above, tiles are no longer shown:

ggplot(mpg, aes(x=displ, y=hwy, fill=year)) + geom_point(aes(color=year)) + stat_loess2d() + facet_wrap(~ class)

Can someone tell me what I'm doing wrong here?

Answer:

##### Explanation

The main issue I see here actually lies beyond what you have done, & is related to how `geom_tile`

handles tile creation across different facets, when the specific x / y axis values differ significantly. An older question demonstrated a similar phenomenon: `geom_tile`

works fine with each facet's data on its own, but put them together, and the tiles shrink to match the smallest difference between different facets' values. This leaves vast amounts of white space in the plot layer, and usually gets progressively worse with every additional facet, until the tiles themselves become practically invisible.

To get around this, I would add a data processing step *after* the density / loess calculations for each facet, to standardize the range of x and y values across all facets.

Some elaboration in case you are not very familiar with the relationship between `compute_layer`

, `compute_panel`

, and `compute_group`

(I certainly wasn't when I started messing around with ggproto objects...):

Essentially, all

`Stat*`

objects have these three functions to bridge the gap between a given dataframe (`mpg`

in this case), and what's received by the`Geom*`

side of things.Of the three,

`compute_layer`

is the top-level function, and typically triggers`compute_panel`

to calculate a separate dataframe for each facet / panel (the terminology used in exported functions is facet, but the underlying package code refers to the same as panel; I'm not sure why either). In turn,`compute_panel`

triggers`compute_group`

to calculate a separate dataframe for each group (as defined by the`group`

/`colour`

/`fill`

/ etc. aesthetic parameters).The results from

`compute_group`

are returned to`compute_panel`

and combined into one dataframe. Likewise,`compute_layer`

receives one dataframe from each facet's`compute_panel`

, and combines them together again. The combined dataframe is then passed on to`Geom*`

to draw.

*(Above is the generic set-up as defined in the top-level Stat. Other Stat* objects inheriting from Stat may override the behaviour in any of the steps. For example, StatIdentity's compute_layer returns the original dataframe as-is, without triggering compute_panel / compute_group at all, because there is no need to do so for unchanged data.)*

For this use case, we can modify the code in `compute_layer`

, after the results have been returned from `compute_panel`

/ `compute_group`

and combined together, to interpolate values associated with each facet into common bins. Because common bins = nice large tiles without white space in between.

##### Modification

Here's how I would have written the `loess2d`

ggproto object, with an additional definition for `compute_layer`

:

loess2d = ggproto("loess2d", Stat, compute_group = function(data, scales) { dens = MASS::kde2d(data$x, data$y) lsurf = loess(fill ~ x + y, data=data) df = data.frame(x = rep(dens$x, length(dens$y)), y = rep(dens$y, each=length(dens$x)), dens = c(dens$z)) df$fill = predict(lsurf, newdata=df[c("x", "y")]) df }, compute_layer = function(self, data, params, layout) { # no change from Stat$compute_layer in this chunk, except # for liberal usage of `ggplot2:::` to utilise un-exported # functions from the package ggplot2:::check_required_aesthetics(self$required_aes, c(names(data), names(params)), ggplot2:::snake_class(self)) data <- remove_missing(data, params$na.rm, c(self$required_aes, self$non_missing_aes), ggplot2:::snake_class(self), finite = TRUE) params <- params[intersect(names(params), self$parameters())] args <- c(list(data = quote(data), scales = quote(scales)), params) df <- plyr::ddply(data, "PANEL", function(data) { scales <- layout$get_scales(data$PANEL[1]) tryCatch(do.call(self$compute_panel, args), error = function(e) { warning("Computation failed in `", ggplot2:::snake_class(self), "()`:\n", e$message, call. = FALSE) data.frame() }) }) # define common x/y grid range across all panels # (length = 25 chosen to match the default value for n in MASS::kde2d) x.range <- seq(min(df$x), max(df$x), length = 25) y.range <- seq(min(df$y), max(df$y), length = 25) # interpolate each panel's data to a common grid, # with NA values for regions where each panel doesn't # have data (this can be changed via the extrap # parameter in akima::interp, but I think # extrapolating may create misleading visuals) df <- df %>% tidyr::nest(-PANEL) %>% mutate(data = purrr::map(data, ~akima::interp(x = .x$x, y = .x$y, z = .x$fill, xo = x.range, yo = y.range, nx = 25, ny = 25) %>% akima::interp2xyz(data.frame = TRUE) %>% rename(fill = z))) %>% tidyr::unnest() return(df) }, required_aes = c("x", "y", "fill") )

Usage:

ggplot(mpg, aes(x=displ, y=hwy, fill=year)) + stat_loess2d() + facet_wrap(~ class) # this does trigger warnings (not errors) because some of the facets contain # really very few observations. if we filter for facets with more rows of data # in the original dataset, this wouldn't be an issue ggplot(mpg %>% filter(!class %in% c("2seater", "minivan")), aes(x=displ, y=hwy, fill=year)) + stat_loess2d() + facet_wrap(~ class) # no warnings triggered

Question:

I am trying to create a new geometry for `ggplot`

as described here, while adapting it to deal with Simple Features objects.

As an example, let's take the same exercise of plotting the convex hull of a set of points. Thus, I wrote a new `geom_envelope()`

function borrowing elements from `geom_sf()`

and a corresponding `GeomEnvelope`

`ggproto`

object that performs the computation overriding the `draw_group()`

method (since I want a single polygon for the full set of points).

However, I must be missing something, since I can't get the polygon plotted. I've been trying for a while but I either get errors or nothing plotted.

library(sf); library(ggplot2); library(dplyr) Npts <- 10 pts <- matrix(runif(2*Npts), ncol = 2) %>% st_multipoint() %>% st_sfc() %>% st_cast("POINT") %>% st_sf() GeomEnvelope <- ggproto( "GeomEnvelope", GeomSf, required_aes = "geometry", default_aes = aes( shape = NULL, colour = "grey20", fill = "white", size = NULL, linetype = 1, alpha = 0.5, stroke = 0.5 ), draw_key = draw_key_polygon, draw_group = function(data, panel_params, coord) { n <- nrow(data) if (n <= 2) return(grid::nullGrob()) gp <- gpar( colour = data$colour, size = data$size, linetype = data$linetype, fill = alpha(data$fill, data$alpha), group = data$group, stringsAsFactors = FALSE ) geometry <- sf::st_convex_hull(st_combine(sf::st_as_sf(data))) sf::st_as_grob(geometry, pch = data$shape, gp = gp) } ) geom_envelope <- function( mapping = aes(), data = NULL, position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...) { if (!is.null(data) && ggplot2:::is_sf(data)) { geometry_col <- attr(data, "sf_column") } else { geometry_col <- "geometry" } if (is.null(mapping$geometry)) { mapping$geometry <- as.name(geometry_col) } c( layer( geom = GeomEnvelope, data = data, mapping = mapping, stat = "identity", position = position, show.legend = if (is.character(show.legend)) TRUE else show.legend, inherit.aes = inherit.aes, params = list( na.rm = na.rm, legend = if (is.character(show.legend)) show.legend else "polygon", ... ) ), coord_sf(default = TRUE) ) } ggplot(pts) + geom_sf() + geom_envelope() + theme_bw()

Created on 2019-04-23 by the reprex package (v0.2.1)

Answer:

If this is your actual use case (rather than a simplified example of it), then I'd say the essential part of what you are looking for is a custom **Stat**, not a custom **Geom**. Data computations / manipulations should happen within the former.

(For reference, I usually look at the code in `GeomBoxplot`

/ `StatBoxplot`

to figure out where things should happen, since that use case includes a bunch of calculations for quantiles / outliers, as well as the combination of different grob elements that accept various aesthetic mappings.)

Data with random seed for reproducibility:

set.seed(123) pts <- matrix(runif(2*Npts), ncol = 2) %>% st_multipoint() %>% st_sfc() %>% st_cast("POINT") %>% st_sf()

##### Basic demonstration

The following `StatEnvelope`

would take the dataset passed to the relevant geom layer, and convert the collection of geometry values within each group (if no grouping aesthetic is specified, the entire dataset will be treated as one group) into a convex hull:

StatEnvelope <- ggproto( "StatEnvelope", Stat, required_aes = "geometry", compute_group = function(data, scales) { if(nrow(data) <= 2) return (NULL) data %>% group_by_at(vars(-geometry)) %>% summarise(geometry = sf::st_convex_hull(sf::st_combine(geometry))) %>% ungroup() } ) ggplot(pts) + geom_sf() + geom_sf(stat = StatEnvelope, alpha = 0.5, color = "grey20", fill = "white", size = 0.5) + theme_bw()

##### Upgrade

The above approach, using the existing `geom_sf`

, does a perfectly passable job at creating the envelope. If we want to specify some default aesthetic parameters, rather than repeat within every instance of `geom_sf`

, we *still* don't need to define a new Geom. A function that modifies the existing `geom_sf`

would do fine.

geom_envelope <- function(...){ suppressWarnings(geom_sf(stat = StatEnvelope, ..., # any aesthetic argument specified in the function # will take precedence over the default arguments # below, with suppressWarning to mute warnings on # any duplicated aesthetics alpha = 0.5, color = "grey20", fill = "white", size = 0.5)) } # outputs same plot as before ggplot(pts) + geom_sf() + geom_envelope() + theme_bw() # with different aesthetic specifications for demonstration ggplot(pts) + geom_sf() + geom_envelope(alpha = 0.1, colour = "brown", fill = "yellow", size = 3) + theme_bw()

##### Explanation for what's going on for the code posted in the question

When I mess around with customised ggproto objects, one useful trick I like to use is to insert print statements within every function I modify, e.g. `"setting up parameters"`

, or `"drawing panel, step 3"`

, etc. This allows me to have a good idea of what's happening beneath the hood, and track where things went wrong when the function (inevitably) returns an error on the 1st / 2nd / ... / nth try.

In this case, if we insert `print("draw group")`

at the beginning of `GeomEnvelope`

's `draw_group`

function before running `ggplot(pts) + geom_sf() + geom_envelope() + theme_bw()`

, we'll observe the absence of any printed message in the console. In other words, **the draw_group function was never called**, so any data manipulation defined therein has no effect on the output.

There are several `draw_*`

functions in `Geom*`

, which can be confusing when we want to make modifications. From the code for `Geom`

, we can see the hierarchy is as follows:

`draw_layer`

(which includes a`do.call(self$draw_panel, args)`

line)`draw_panel`

(which includes a`self$draw_group(group, panel_params, coord, ...)`

line)`draw_group`

(which is not implemented for`Geom`

).

So `draw_layer`

triggers `draw_panel`

, and `draw_panel`

triggers `draw_group`

. (Mirroring this, in `Stat`

, `compute_layer`

triggers `compute_panel`

and `compute_panel`

triggers `compute_group`

.)

`GeomSf`

, which inherits from `Geom`

(code here), overrides `Geom`

's `draw_panel`

function with a chunk of code that returns a `sf_grob(...)`

, and ** DOES NOT** trigger

`draw_group`

.Consequently, when `GeomEnvelope`

inherits `GeomSf`

's `draw_panel`

function, nothing in its `draw_group`

function is going to matter. What's drawn in the plot depends on `draw_panel`

, and the `geom_envelope`

layer in the question performs essentially the same task as `geom_sf`

, plotting each individual point separately. If you remove / comment out the `geom_sf`

layer, you'll see the same points; just with color = "grey20", alpha = 0.5, etc., as specified in `GeomSf`

's `default_aes`

.

*(Note: fill = "white" isn't in use, because geom_sf defaults to GeomPoint's default aesthetics for point data, which means it inherits GeomPoint's pch = 19 for its point shape, and plots a solid circle unaffected by any fill value.)*