Hot questions for Using Ggplot2 in ggproto

Question:

I'd like to create a split violin density plot using ggplot, like the fourth example on this page of the seaborn documentation.

Here is some data:

set.seed(20160229)

my_data = data.frame(
    y=c(rnorm(1000), rnorm(1000, 0.5), rnorm(1000, 1), rnorm(1000, 1.5)),
    x=c(rep('a', 2000), rep('b', 2000)),
    m=c(rep('i', 1000), rep('j', 2000), rep('i', 1000))
)

I can plot dodged violins like this:

library('ggplot2')

ggplot(my_data, aes(x, y, fill=m)) +
  geom_violin()

But it's hard to visually compare the widths at different points in the side-by-side distributions. I haven't been able to find any examples of split violins in ggplot - is it possible?

I found a base R graphics solution but the function is quite long and I want to highlight distribution modes, which are easy to add as additional layers in ggplot but will be harder to do if I need to figure out how to edit that function.


Answer:

Or, to avoid fiddling with the densities, you could extend ggplot2's GeomViolin like this:

GeomSplitViolin <- ggproto("GeomSplitViolin", GeomViolin, 
                           draw_group = function(self, data, ..., draw_quantiles = NULL) {
  data <- transform(data, xminv = x - violinwidth * (x - xmin), xmaxv = x + violinwidth * (xmax - x))
  grp <- data[1, "group"]
  newdata <- plyr::arrange(transform(data, x = if (grp %% 2 == 1) xminv else xmaxv), if (grp %% 2 == 1) y else -y)
  newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ], newdata[1, ])
  newdata[c(1, nrow(newdata) - 1, nrow(newdata)), "x"] <- round(newdata[1, "x"])

  if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
    stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <=
      1))
    quantiles <- ggplot2:::create_quantile_segment_frame(data, draw_quantiles)
    aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), c("x", "y")), drop = FALSE]
    aesthetics$alpha <- rep(1, nrow(quantiles))
    both <- cbind(quantiles, aesthetics)
    quantile_grob <- GeomPath$draw_panel(both, ...)
    ggplot2:::ggname("geom_split_violin", grid::grobTree(GeomPolygon$draw_panel(newdata, ...), quantile_grob))
  }
  else {
    ggplot2:::ggname("geom_split_violin", GeomPolygon$draw_panel(newdata, ...))
  }
})

geom_split_violin <- function(mapping = NULL, data = NULL, stat = "ydensity", position = "identity", ..., 
                              draw_quantiles = NULL, trim = TRUE, scale = "area", na.rm = FALSE, 
                              show.legend = NA, inherit.aes = TRUE) {
  layer(data = data, mapping = mapping, stat = stat, geom = GeomSplitViolin, 
        position = position, show.legend = show.legend, inherit.aes = inherit.aes, 
        params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles, na.rm = na.rm, ...))
}

And use the new geom_split_violin like this:

ggplot(my_data, aes(x, y, fill = m)) + geom_split_violin()

Question:

Is it possible to display custom image (say png format) as geom_point in R ggplot?

library(png)
pic1 <- readPNG("pic1.png")

png("Heatmap.png", units="px", width=3200, height=3200, res=300)
ggplot(data_frame, aes(medium, day, fill = Transactions))  +
   geom_tile(colour="white")  +
   facet_grid(dime3_year~dime3_month) + 
   scale_fill_gradient(high="blue",low="white") +
   theme_bw() + 
   geom_point(aes(dime3_channel, day, size=Conv,alpha=Conv,image=(annotation_raster(pic1,xmin=0,ymin=0,xmax=5,ymax=5)),color="firebrick")) +

Gives error:

Don't know how to automatically pick scale for object of type proto/environment. Defaulting to continuous Error: Aesthetics must either be length one, or the same length as the dataProblems:(annotation_raster(conv_pic, xmin = 0, ymin = 0, xmax = 5, ymax = 5))


Answer:

The point geom is used to create scatterplots, and doesn't quite seem to be designed to do what you need, ie, display custom images. However, a similar question was answered here, which indicates that the problem can be solved in the following steps:

(1) Read the custom images you want to display,

(2) Render raster objects at the given location, size, and orientation using the rasterGrob() function,

(3) Use a plotting function such as qplot(),

(4) Use a geom such as annotation_custom() for use as static annotations specifying the crude adjustments for x and y limits as mentioned by user20650.

Using the code below, I could get two custom images img1.png and img2.png positioned at the given xmin, xmax, ymin, and ymax.

library(png)
library(ggplot2)
library(gridGraphics)
setwd("c:/MyFolder/")

img1 <- readPNG("img1.png")
img2 <- readPNG("img2.png")
g1 <- rasterGrob(img1, interpolate=FALSE)
g2 <- rasterGrob(img2, interpolate=FALSE)
qplot(1:10, 1:10, geom="blank") + 
  annotation_custom(g1, xmin=1, xmax=3, ymin=1, ymax=3) +
  annotation_custom(g2, xmin=7, xmax=9, ymin=7, ymax=9) +  
  geom_point()

Question:

I have a faceted plot with very diverse data. So some facets have only 1 x value, but some others have 13 x values. I know there is the parameter space='free' which adjusts the width of each facet by the data it represents.

My question, is there a possibility to adjust this space manually? Since some of my facets are so small, it is no longer possible to read the labels in the facets. I made a little reproducible example to show what I mean.

df <- data.frame(labelx=rep(c('my long label','short'), c(2,26)),
                 labely=rep(c('a','b'), each=14),
                 x=c(letters[1:2],letters[1:26]),
                 y=LETTERS[6:7],
                 i=rnorm(28))
ggplot(df, aes(x,y,color=i)) +
  geom_point() +
  facet_grid(labely~labelx, scales='free_x', space='free_x')

So depending on your screen, the my long label facet gets compressed and you can no longer read the label.

I found a post on the internet which seems to do exactly what I want to do, but this seems to no longer work in ggplot2. The post is from 2010.

https://kohske.wordpress.com/2010/12/25/adjusting-the-relative-space-of-a-facet-grid/

He suggests to use facet_grid(fac1 + fac2 ~ fac3 + fac4, widths = 1:4, heights = 4:1), so widths and heights to adjust each facet size manually.


Answer:

You can adjust the widths of a ggplot object using grid graphics

g = ggplot(df, aes(x,y,color=i)) +
  geom_point() +
  facet_grid(labely~labelx, scales='free_x', space='free_x')

library(grid)
gt = ggplot_gtable(ggplot_build(g))
gt$widths[4] = 4*gt$widths[4]
grid.draw(gt)

With complex graphs with many elements, it can be slightly cumbersome to determine which width it is that you want to alter. In this instance it was grid column 4 that needed to be expanded, but this will vary for different plots. There are several ways to determine which one to change, but a fairly simple and good way is to use gtable_show_layout from the gtable package.

gtable_show_layout(gt)

produces the following image:

in which we can see that the left hand facet is in column number 4. The first 3 columns provide room for the margin, the axis title and the axis labels+ticks. Column 5 is the space between the facets, column 6 is the right hand facet. Columns 7 through 12 are for the right hand facet labels, spaces, the legend, and the right margin.

An alternative to inspecting a graphical representation of the gtable is to simply inspect the table itself. In fact if you need to automate the process, this would be the way to do it. So lets have a look at the TableGrob:

gt
# TableGrob (13 x 12) "layout": 25 grobs
#     z         cells       name                                   grob
# 1   0 ( 1-13, 1-12) background        rect[plot.background..rect.399]
# 2   1 ( 7- 7, 4- 4)  panel-1-1               gTree[panel-1.gTree.283]
# 3   1 ( 9- 9, 4- 4)  panel-2-1               gTree[panel-3.gTree.305]
# 4   1 ( 7- 7, 6- 6)  panel-1-2               gTree[panel-2.gTree.294]
# 5   1 ( 9- 9, 6- 6)  panel-2-2               gTree[panel-4.gTree.316]
# 6   3 ( 5- 5, 4- 4)   axis-t-1                         zeroGrob[NULL]
# 7   3 ( 5- 5, 6- 6)   axis-t-2                         zeroGrob[NULL]
# 8   3 (10-10, 4- 4)   axis-b-1    absoluteGrob[GRID.absoluteGrob.329]
# 9   3 (10-10, 6- 6)   axis-b-2    absoluteGrob[GRID.absoluteGrob.336]
# 10  3 ( 7- 7, 3- 3)   axis-l-1    absoluteGrob[GRID.absoluteGrob.343]
# 11  3 ( 9- 9, 3- 3)   axis-l-2    absoluteGrob[GRID.absoluteGrob.350]
# 12  3 ( 7- 7, 8- 8)   axis-r-1                         zeroGrob[NULL]
# 13  3 ( 9- 9, 8- 8)   axis-r-2                         zeroGrob[NULL]
# 14  2 ( 6- 6, 4- 4)  strip-t-1                          gtable[strip]
# 15  2 ( 6- 6, 6- 6)  strip-t-2                          gtable[strip]
# 16  2 ( 7- 7, 7- 7)  strip-r-1                          gtable[strip]
# 17  2 ( 9- 9, 7- 7)  strip-r-2                          gtable[strip]
# 18  4 ( 4- 4, 4- 6)     xlab-t                         zeroGrob[NULL]
# 19  5 (11-11, 4- 6)     xlab-b titleGrob[axis.title.x..titleGrob.319]
# 20  6 ( 7- 9, 2- 2)     ylab-l titleGrob[axis.title.y..titleGrob.322]
# 21  7 ( 7- 9, 9- 9)     ylab-r                         zeroGrob[NULL]
# 22  8 ( 7- 9,11-11)  guide-box                      gtable[guide-box]
# 23  9 ( 3- 3, 4- 6)   subtitle  zeroGrob[plot.subtitle..zeroGrob.396]
# 24 10 ( 2- 2, 4- 6)      title     zeroGrob[plot.title..zeroGrob.395]
# 25 11 (12-12, 4- 6)    caption   zeroGrob[plot.caption..zeroGrob.397]

The relevant bits are

#         cells       name  
# ( 7- 7, 4- 4)  panel-1-1      
# ( 9- 9, 4- 4)  panel-2-1              
# ( 6- 6, 4- 4)  strip-t-1

in which the names panel-x-y refer to panels in x, y coordinates, and the cells give the coordinates (as ranges) of that named panel in the table. So, for example, the top and bottom left-hand panels both are located in table cells with the column ranges 4- 4. (only in column four, that is). The left-hand top strip is also in cell column 4.

If you wanted to use this table to find the relevant width programmatically, rather than manually, (using the top left facet, ie "panel-1-1" as an example) you could use

gt$layout$l[grep('panel-1-1', gt$layout$name)]
# [1] 4

Question:

I am having a little trouble with my radar chart in R. Even though the plot is fine I am getting the following warning:

> source('~/.active-rstudio-document')
Warning message:
In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated
> radar
Warning messages:
1: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated
2: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  :
  duplicated levels in factors are deprecated

I've seen the same error in other posts but I didn't really understand how to apply the answers to my dataset ...

This is my dataset

MSF,C1,2
OCA,C1,6
SIOA,C1,4
CCFF,C1,4
MSF,C2,4
OCA,C2,2
SIOA,C2,6
CCFF,C2,2
MSF,C3,6
OCA,C3,6
SIOA,C3,6
CCFF,C3,6

And this is the code for the corresponding radar chart (probably only the first part where I define my dataset is relevant but yeah ... that's where I am lost):

colnames(dataset) = c("type", "variable", "value")
dataset$value = as.numeric(dataset$value)

dataset$variable <- factor(dataset$variable, levels = rev(dataset$variable), ordered=TRUE)

# Radar function ------------------------------------------------------------
coord_radar <- function (theta = "x", start = 0, direction = 1) {
  theta <- match.arg(theta, c("x", "y"))
  r <- if (theta == "x")
    "y"
  else "x"
  ggproto("CordRadar", CoordPolar, theta = theta, r = r, start = start,
          direction = sign(direction),
          is_linear = function(coord) TRUE)
}


# Radar plot ------------------------------------------------------------
radar <- ggplot(dataset, aes(x = variable, y = value, group=type)) +
  geom_polygon(aes(group = type, color=type,fill=type), size = 1, alpha=0.1) + 
  scale_fill_manual(values=cbPalette) +
  geom_line(aes(group = type, color=type)) + 
  scale_colour_manual(values = cbPalette) + 
  coord_radar() 

Answer:

Yes, almost all of that is irrelevant to your problem.

You are trying to create a factor with the following levels: rev(dataset$variable). That yields:

[1] C3 C3 C3 C3 C2 C2 C2 C2 C1 C1 C1

See how you have replicated levels? You'll want to have each level only once, in the order that you want. The default is sort(unique(dataset$variable)), which gives C1 C2 C3, or you could use rev(unique(dataset$variable) to give C3 C2 C1.

The forcats package has several convenience functions to easily make or change factors and the order of their levels.

Question:

This is actually two questions in one (not sure if goes against SO rules, but anyway).

First question is how can I force a geom_text to fit within a geom_bar? (dynamically according to the values plotted)

Looking around, the solutions I found were changing the size of the label. That certainly works, but not for every case. You can change the size for a specific plot to make the text fit within the bar, but when the data changes, you may need to manually change the size of the text again. My real-life problem is that I need to generate the same plot for constantly changing data (daily), so I cannot really manually adjust the size for each plot.

I tried setting the size of the label as a function of the data. It kinda works, not perfectly, but works for many cases.

But here's another problem, even when the label fits within the bar, resizing the plot messes everything up. Looking into it, I also found in the ggplot documentation that

labels do have height and width, but they are physical units, not data units. The amount of space they occupy on that plot is not constant in data units: when you resize a plot, labels stay the same size, but the size of the axes changes.

Which takes me to my second question: Is it possible to change this default behaviour and let/make the labels resize with the plot?

And also let me refine my first question. Is it possible to force a geom_text to fit within a geom_bar, dynamically setting the size of the text using a clever relation between physical units and data units?

So, to follow good practice, here's my reproducible example:

set.seed(1234567)
data_gd <- data.frame(x = letters[1:5], 
                      y = runif(5, 100, 99999))

ggplot(data = data_gd,
       mapping = aes(x = x, y = y, fill = x)) +
    geom_bar(stat = "identity") +
    geom_text(mapping = aes(label = y, y = y/2))

This code produces this plot:

If I simply resize the plot, "labels stay the same size, but the size of the axes changes" thereby making the labels fit into the bars (now perhaps labels are even too small).

So, this is my second question. It would be nice that the labels resize as well and keep the aspect ration in relation to the bars. Any ideas how to accomplish this or if it is possible at all?

Ok, but going back to how to fit the labels within the bars, the simplest solution is to set the size of the labels.

ggplot(data = data_gd,
       mapping = aes(x = x, y = y, fill = x)) +
    geom_bar(stat = "identity") +
    geom_text(mapping = aes(label = y, y = y/2), size = 3)

Again, this works as shown below, but it is not maintainable /nor robust to changes in the data.

For example, the very same code to generate the plot with different data yields catastrophic results.

data_gd <- data.frame(x = letters[1:30], 
                      y = runif(30, 100, 99999))
ggplot(data = data_gd,
       mapping = aes(x = x, y = y, fill = x)) +
    geom_bar(stat = "identity") +
    geom_text(mapping = aes(label = y, y = y/2), size = 3)

And I can go on with the examples, setting the size of the labels as a function of the number of categories on x-axis and so on. But you get the point, and perhaps one of you ggplot2 experts can give me ideas.


Answer:

one option might be to write a geom that uses a textGrob with a custom drawDetails method to fit within the allocated space, set by the bar width.

library(grid)
library(ggplot2)

fitGrob <- function(label, x=0.5, y=0.5, width=1){
  grob(x=x, y=y, width=width, label=label, cl = "fit")
}
drawDetails.fit <- function(x, recording=FALSE){
  tw <- sapply(x$label, function(l) convertWidth(grobWidth(textGrob(l)), "native", valueOnly = TRUE))
  cex <- x$width / tw
  grid.text(x$label, x$x, x$y, gp=gpar(cex=cex), default.units = "native")
}


`%||%` <- ggplot2:::`%||%`

GeomFit <- ggproto("GeomFit", GeomRect,
                   required_aes = c("x", "label"),

                   setup_data = function(data, params) {
                     data$width <- data$width %||%
                       params$width %||% (resolution(data$x, FALSE) * 0.9)
                     transform(data,
                               ymin = pmin(y, 0), ymax = pmax(y, 0),
                               xmin = x - width / 2, xmax = x + width / 2, width = NULL
                     )
                   },
                   draw_panel = function(self, data, panel_scales, coord, width = NULL) {
                     bars <- ggproto_parent(GeomRect, self)$draw_panel(data, panel_scales, coord)
                     coords <- coord$transform(data, panel_scales)    
                     width <- abs(coords$xmax - coords$xmin)
                     tg <- fitGrob(label=coords$label, y = coords$y/2, x = coords$x, width = width)

                     grobTree(bars, tg)
                   }
)

geom_fit <- function(mapping = NULL, data = NULL,
                     stat = "count", position = "stack",
                     ...,
                     width = NULL,
                     binwidth = NULL,
                     na.rm = FALSE,
                     show.legend = NA,
                     inherit.aes = TRUE) {

  layer(
    data = data,
    mapping = mapping,
    stat = stat,
    geom = GeomFit,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      width = width,
      na.rm = na.rm,
      ...
    )
  )
}


set.seed(1234567)
data_gd <- data.frame(x = letters[1:5], 
                      y = runif(5, 100, 99999))

ggplot(data = data_gd,
       mapping = aes(x = x, y = y, fill = x, label=round(y))) +
  geom_fit(stat = "identity") +
  theme()

Question:

Is there a way to get the arrowhead closed on geom_curve? The same code works with geom_segment. Maybe its a bug?

library(tidyverse)

set.seed(123)

data <- data_frame(x = rnorm(10), y = rnorm(10))

# NO ARROWHEAD FILL

ggplot(data, aes(x, y)) + 
  geom_point() +
  geom_curve(aes(x = 0, y = 0, xend = 1, yend = 1),
             color = "black",
             arrow = arrow(type = "closed"))

# ARROWHEAD FILL WORKS

ggplot(data, aes(x, y)) + 
  geom_point() +
  geom_segment(aes(x = 0, y = 0, xend = 1, yend = 1),
             color = "black",
             arrow = arrow(type = "closed"))

Answer:

I'd call it a bug and you should file an issue. Until then:

geom_curve2 <- function(mapping = NULL, data = NULL,
                       stat = "identity", position = "identity",
                       ...,
                       curvature = 0.5,
                       angle = 90,
                       ncp = 5,
                       arrow = NULL,
                       lineend = "butt",
                       na.rm = FALSE,
                       show.legend = NA,
                       inherit.aes = TRUE) {
  layer(
    data = data,
    mapping = mapping,
    stat = stat,
    geom = GeomCurve2,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      arrow = arrow,
      curvature = curvature,
      angle = angle,
      ncp = ncp,
      lineend = lineend,
      na.rm = na.rm,
      ...
    )
  )
}

GeomCurve2 <- ggproto("GeomCurve2", GeomSegment,
  draw_panel = function(data, panel_params, coord, curvature = 0.5, angle = 90,
                        ncp = 5, arrow = NULL, lineend = "butt", na.rm = FALSE) {
    if (!coord$is_linear()) {
      warning("geom_curve is not implemented for non-linear coordinates",
        call. = FALSE)
    }
    trans <- coord$transform(data, panel_params)

    curveGrob(
      trans$x, trans$y, trans$xend, trans$yend,
      default.units = "native",
      curvature = curvature, angle = angle, ncp = ncp,
      square = FALSE, squareShape = 1, inflect = FALSE, open = TRUE,
      gp = gpar(
        col = alpha(trans$colour, trans$alpha),
        fill = alpha(trans$colour, trans$alpha),
        lwd = trans$size * .pt,
        lty = trans$linetype,
        lineend = lineend),
      arrow = arrow
    )
  }
)

Which leads to:

ggplot(data, aes(x, y)) + 
  geom_point() +
  geom_curve2(aes(x = 0, y = 0, xend = 1, yend = 1),
             color = "black",
             arrow = arrow(type = "closed")) 

and

Question:

I've been reading the vignette on extending ggplot2, but I'm a bit stuck on how I can make a single geom that can add multiple geometries to the plot. Multiple geometries already exist in ggplot2 geoms, for example, we have things like geom_contour (multiple paths), and geom_boxplot (multiple paths and points). But I can't quite see how to extend those into new geoms.

Let's say I'm trying to make a geom_manythings that will draw two polygons and one point by computing on a single dataset. One polygon will be a convex hull for all the points, the second polygon will be a convex hull for a subset of the points, and the single point will represent the centre of the data. I want all of these to appear with a call to one geom, rather than three separate calls, as we see here:

# example data set
set.seed(9)
n <- 1000
x <- data.frame(x = rnorm(n),
                y = rnorm(n))

# computations for the geometries 
# chull for all the points
hull <-  x[chull(x),]
# chull for all a subset of the points
subset_of_x <- x[x$x > 0 & x$y > 0 , ]
hull_of_subset <- subset_of_x[chull(subset_of_x), ]
# a point in the centre of the data
centre_point <- data.frame(x = mean(x$x), y = mean(x$y))

# plot
library(ggplot2)
ggplot(x, aes(x, y)) +
  geom_point() + 
  geom_polygon(data = x[chull(x),], alpha = 0.1) +
  geom_polygon(data = hull_of_subset, alpha = 0.3) +
  geom_point(data = centre_point, colour = "green", size = 3)

I want to have a geom_manythings to replace the three geom_* in the code above.

In an attempt to make a custom geom, I started with code in geom_tufteboxplot and geom_boxplot as templates, along with the 'extending ggplot2' vignette:

library(ggplot2)
library(proto)

GeomManythings <- ggproto(
  "GeomManythings",
  GeomPolygon,
  setup_data = function(self, data, params) {
    data <- ggproto_parent(GeomPolygon, self)$setup_data(data, params)
    data
  },

  draw_group = function(data, panel_scales, coord) {
    n <- nrow(data)
    if (n <= 2)
      return(grid::nullGrob())

    common <- data.frame(
      colour = data$colour,
      size = data$size,
      linetype = data$linetype,
      fill = alpha(data$fill, data$alpha),
      group = data$group,
      stringsAsFactors = FALSE
    )

    # custom bits...

    # polygon hull for all points
    hull <-  data[chull(data), ]
    hull_df <- data.frame(x = hull$x, 
                          y = hull$y, 
                          common, 
                          stringsAsFactors = FALSE)
    hull_grob <-
      GeomPolygon$draw_panel(hull_df, panel_scales, coord)

    # polygon hull for subset
    subset_of_x <-
      data[data$x > 0 & data$y > 0 ,]
    hull_of_subset <-
      subset_of_x[chull(subset_of_x),]
    hull_of_subset_df <- data.frame(x = hull_of_subset$x, 
                                    y = hull_of_subset$y, 
                                    common, 
                                    stringsAsFactors = FALSE)
    hull_of_subset_grob <-
      GeomPolygon$draw_panel(hull_of_subset_df, panel_scales, coord)

    # point for centre point
    centre_point <-
      data.frame(x = mean(coords$x), 
                 y = coords(data$y),
                 common, 
                 stringsAsFactors = FALSE)

    centre_point_grob <-
      GeomPoint$draw_panel(centre_point, panel_scales, coord)

    # end of custom bits

    ggname("geom_mypolygon",
           grobTree(hull_grob,
                    hull_of_subset_grob,
                    centre_point_grob))


  },

  required_aes = c("x", "y"),

  draw_key = draw_key_polygon,

  default_aes = aes(
    colour = "grey20",
    fill = "grey20",
    size = 0.5,
    linetype = 1,
    alpha = 1,
  )
)

geom_manythings <-
  function(mapping = NULL,
           data = NULL,
           stat = "identity",
           position = "identity",
           na.rm = FALSE,
           show.legend = NA,
           inherit.aes = TRUE,
           ...) {
    layer(
      geom = GeomManythings,
      mapping = mapping,
      data = data,
      stat = stat,
      position = position,
      show.legend = show.legend,
      inherit.aes = inherit.aes,
      params = list(na.rm = na.rm, ...)
    )
  }

But clearly there are quite a few things not right in this geom, I must be missing some fundamental details...

ggplot(x, aes(x, y)) +
  geom_point() +
  geom_manythings()

How can I write this geom to get the desired result?


Answer:

there are quite a few issues in your code, so I suggest you try with a simplified case first. In particular, the chull calculation was problematic. Try this,

library(ggplot2)
library(proto)
library(grid)

GeomManythings <- ggproto(
  "GeomManythings",
  Geom,
  setup_data = function(self, data, params) {
    data <- ggproto_parent(Geom, self)$setup_data(data, params)
    data
  },

  draw_group = function(data, panel_scales, coord) {
    n <- nrow(data)
    if (n <= 2)
      return(grid::nullGrob())


    # polygon hull for all points
    hull_df <-  data[chull(data[,c("x", "y")]), ]

    hull_grob <-
      GeomPolygon$draw_panel(hull_df, panel_scales, coord)

    # polygon hull for subset
    subset_of_x <-
      data[data$x > 0 & data$y > 0 ,]
    hull_of_subset_df <-subset_of_x[chull(subset_of_x[,c("x", "y")]),]
    hull_of_subset_df$fill <- "red" # testing
    hull_of_subset_grob <-  GeomPolygon$draw_panel(hull_of_subset_df, panel_scales, coord)

    coords <- coord$transform(data, panel_scales)     

    pg <- pointsGrob(x=mean(coords$x), y=mean(coords$y), 
                     default.units = "npc", gp=gpar(col="green", cex=3))

    ggplot2:::ggname("geom_mypolygon",
                     grobTree(hull_grob,
                              hull_of_subset_grob, pg))


  },


  required_aes = c("x", "y"),

  draw_key = draw_key_polygon,

  default_aes = aes(
    colour = "grey20",
    fill = "grey50",
    size = 0.5,
    linetype = 1,
    alpha = 0.5
  )
)

geom_manythings <-
  function(mapping = NULL,
           data = NULL,
           stat = "identity",
           position = "identity",
           na.rm = FALSE,
           show.legend = NA,
           inherit.aes = TRUE,
           ...) {
    layer(
      geom = GeomManythings,
      mapping = mapping,
      data = data,
      stat = stat,
      position = position,
      show.legend = show.legend,
      inherit.aes = inherit.aes,
      params = list(na.rm = na.rm, ...)
    )
  }


set.seed(9)
n <- 20
d <- data.frame(x = rnorm(n),
                y = rnorm(n))

ggplot(d, aes(x, y)) +
  geom_manythings()+
  geom_point() 

(disclaimer: I haven't tried to write a geom in 5 years, so I don't know how it works nowadays)

Question:

I would like to save some specifications of a ggplot command for later, given that I need to run several different graphs that all share some scale aesthetics.

Let's say I would like save this for later:

my.scale_aes <- scale_x_continuous(...) + scale_color_manual(...)

This would of course prompt an error message, indicating that you cannot add ggproto objects together without a direct ggplot() call. But is that really the case? And is there another way by which I could still add these components together?

I read somewhere else that it has to do with the different methods of adding elements together: methods("+") and that what I need has something to do with +.gg* but I have no idea how to implement this and how to make it work.


Answer:

You can do this by defining a list of the ggplot terms you want and adding them in.

library(ggplot2)

my.scale_aes <- list(
  scale_x_continuous(breaks = c(56, 60, 61)),
  scale_color_manual(values = c("black", "red"))
)

ggplot(data = diamonds[1:100,],
       aes(depth, price, color = cut == "Ideal")) +
  geom_point() +
  my.scale_aes 

Question:

I know that in ggplot2 one can add the convex hull to a scatterplot by group as in

library(ggplot2)
library(plyr)
data(iris)
df<-iris
find_hull <- function(df) df[chull(df$Sepal.Length, df$Sepal.Width), ]
hulls <- ddply(df, "Species", find_hull)
plot <- ggplot(data = df, aes(x = Sepal.Length, y = Sepal.Width, colour=Species, fill = Species)) +
    geom_point() + 
    geom_polygon(data = hulls, alpha = 0.5) +
    labs(x = "Sepal.Length", y = "Sepal.Width")
plot

I was wondering though how one could calculate and add alpha bags instead, i.e. the largest convex hull that contains at least a proportion 1-alpha of all the points? Either in 2d (to display with ggplot2) or 3d (to display with rgl).

EDIT: My initial idea was be to keep on "peeling" the convex hull for as along as the criterion of containing at least a given % of points would be satisfied, although in the paper here it seems they use a different algorithm (isodepth, which seems to be implemented in R package depth, in function isodepth and aplpack::plothulls seems also close to what I want (although it produces a full plot as opposed to just the contour), so I think with these I may be sorted. Though these function only works in 2D, and I would also be interested in a 3D extension (to be plotted in rgl). If anyone has any pointers let me know!

EDIT2: with function depth::isodepth I found a 2d solution (see post below), although I am still looking for a 3D solution as well - if anyone would happen to know how to do that, please let me know!


Answer:

Ha with the help of function depth::isodepth I came up with the following solution - here I find the alpha bag that contains a proportion of at least 1-alpha of all points :

library(mgcv)
library(depth)
library(plyr)
library(ggplot2)
data(iris)
df=iris[,c(1,2,5)]
alph=0.05
find_bag = function(x,alpha=alph) {
    n=nrow(x)
    target=1-alpha
    propinside=1
    d=1
    while (propinside>target) {
        p=isodepth(x[,1:2],dpth=d,output=T, mustdith=T)[[1]]
        ninside=sum(in.out(p,as.matrix(x[,1:2],ncol=2))*1)
        nonedge=sum(sapply(1:nrow(p),function (row)
            nrow(merge(round(setNames(as.data.frame(p[row,,drop=F]),names(x)[1:2]),5),as.data.frame(x[,1:2])))>0)*1)-3
        propinside=(ninside+nonedge)/n
        d=d+1
    }
    p=isodepth(x[,1:2],dpth=d-1,output=T, mustdith=T)[[1]]
    p }
bags <- ddply(df, "Species", find_bag,alpha=alph)
names(bags) <- c("Species",names(df)[1:2])
plot <- ggplot(data = df, aes(x = Sepal.Length, y = Sepal.Width, colour=Species, fill = Species)) +
    geom_point() + 
    geom_polygon(data = bags, alpha = 0.5) +
    labs(x = "Sepal.Length", y = "Sepal.Width")
plot

EDIT2: Using my original idea of convex hull peeling I also came up with the following solution which now works in 2d & 3d; the result is not quite the same is with the isodepth algorithm, but it's pretty close :

# in 2d
library(plyr)
library(ggplot2)
data(iris)
df=iris[,c(1,2,5)]
alph=0.05
find_bag = function(x,alpha=alph) {
  n=nrow(x)
  propinside=1
  target=1-alpha
  x2=x
  while (propinside>target) {
    propinside=nrow(x2)/n
    hull=chull(x2)
    x2old=x2
    x2=x2[-hull,]
  }
  x2old[chull(x2old),] }
bags <- ddply(df, "Species", find_bag, alpha=alph)
plot <- ggplot(data = df, aes(x = Sepal.Length, y = Sepal.Width, colour=Species, fill = Species)) +
  geom_point() +
  geom_polygon(data = bags, alpha = 0.5) +
  labs(x = "Sepal.Length", y = "Sepal.Width")
plot

# in 3d
library(plyr)
library(ggplot2)
data(iris)
df=iris[,c(1,2,3,5)]
levels=unique(df[,"Species"])
nlevels=length(levels)
zoom=0.8
cex=1
aspectr=c(1,1,0.7)
pointsalpha=1
userMatrix=matrix(c(0.80,-0.60,0.022,0,0.23,0.34,0.91,0,-0.55,-0.72,0.41,0,0,0,0,1),ncol=4,byrow=T)
windowRect=c(0,29,1920,1032)
cols=c("red","forestgreen","blue")
alph=0.05
plotbag = function(x,alpha=alph,grp=1,cols=c("red","forestgreen","blue"),transp=0.2) {
  propinside=1
  target=1-alpha
  x2=x
  levels=unique(x2[,ncol(x2)])
  x2=x2[x2[,ncol(x2)]==levels[[grp]],]
  n=nrow(x2)
  while (propinside>target) {
    propinside=nrow(x2)/n
    hull=unique(as.vector(convhulln(as.matrix(x2[,1:3]), options = "Tv")))
    x2old=x2
    x2=x2[-hull,]
  }
  ids=t(convhulln(as.matrix(x2old[,1:3]), options = "Tv"))
  rgl.triangles(x2old[ids,1],x2old[ids,2],x2old[ids,3],col=cols[[grp]],alpha=transp,shininess=50)
}

open3d(zoom=zoom,userMatrix=userMatrix,windowRect=windowRect,antialias=8)
for (i in 1:nlevels) { 
  plot3d(x=df[df[,ncol(df)]==levels[[i]],][,1],
         y=df[df[,ncol(df)]==levels[[i]],][,2],
         z=df[df[,ncol(df)]==levels[[i]],][,3],
         type="s", 
         col=cols[[i]],
         size=cex,
         lit=TRUE,
         alpha=pointsalpha,point_antialias=TRUE,
         line_antialias=TRUE,shininess=50, add=TRUE)
  plotbag(df,alpha=alph, grp=i, cols=c("red","forestgreen","blue"), transp=0.3) }
axes3d(color="black",drawfront=T,box=T,alpha=1)
title3d(color="black",xlab=names(df)[[1]],ylab=names(df)[[2]],zlab=names(df)[[3]],alpha=1)
aspect3d(aspectr)

Question:

I'm often using boxplots in my work and like ggplot2 aesthetics. But standard geom_boxplot lacks two things important for me: ends of whiskers and median labels. Thanks to information from here I've written a function:

gBoxplot <- function(formula = NULL, data = NULL, font = "CMU Serif", fsize = 18){
  require(ggplot2)
  vars <- all.vars(formula)
  response <- vars[1]
  factor <- vars[2]
  # A function for medians labelling
  fun_med <- function(x){
    return(data.frame(y = median(x), label = round(median(x), 3)))
  }
  p <- ggplot(data, aes_string(x = factor, y = response)) +
  stat_boxplot(geom = "errorbar", width = 0.6) +
  geom_boxplot() +
  stat_summary(fun.data = fun_med, geom = "label", family = font, size = fsize/3, 
                                                                         vjust = -0.1) +
  theme_grey(base_size = fsize, base_family = font)
  return(p)
}

There are also font settings, but this is just because I'm too lazy to make a theme. Here is an example:

gBoxplot(hwy ~ class, mpg)

Good enough for me, but there are some restrictictions (cannot use auto-dodging, etc.), and it will be better to make a new geom based on geom_boxplot. I've read the vignette Extending ggplot2, but cannot understand how to implement it. Any help will be appreciated.


Answer:

So been thinking about this one for a while. Basically when you create a new primitive, you normally write a combination of:

  1. A layer-function
  2. A stat-ggproto,
  3. A geom-ggproto

Only the layer-function need be visible to the user. You only need to write a stat-ggproto if you need some new way of transforming your data to make your primitive. And you only need write a geom-ggproto if you have some new grid-based graphics to create.

In this case, where we are basically composting layer-function that already exist, we don’t really need to write new ggprotos. It is enough to write a new layer-function. This layer-function will create the three layers that you already are using and map the parameters the way you intend. In this case:

  • Layer1 – uses geom_errorbar and stat_boxplot – to get our errorbars
  • Layer2 – uses geom_boxplot and stat_boxplot - to create the boxplots
  • Layer3 – users geom_label and stat_summary - to create the text labels with the mean value in the center of the boxes.

Of course you could write a new stat-ggproto and a new geom-ggproto that do all of these things at once. Or maybe you compost stat_summary and stat_boxplot into one, and the three geom-protos as well, and this do this with one layer. But there is little point unless we have efficiency problems.

Anyway, here is the code:

geom_myboxplot <- function(formula = NULL, data = NULL,
                           stat = "boxplot", position = "dodge",coef=1.5,
                           font = "sans", fsize = 18, width=0.6,
                           fun.data = NULL, fun.y = NULL, fun.ymax = NULL,
                           fun.ymin = NULL, fun.args = list(),
                           outlier.colour = NULL, outlier.color = NULL,
                           outlier.shape = 19, outlier.size = 1.5,outlier.stroke = 0.5,
                           notch = FALSE,  notchwidth = 0.5,varwidth = FALSE,
                           na.rm = FALSE, show.legend = NA,
                           inherit.aes = TRUE,...) {
    vars <- all.vars(formula)
    response <- vars[1]
    factor <- vars[2]
    mymap <- aes_string(x=factor,y=response)
    fun_med <- function(x) {
        return(data.frame(y = median(x), label = round(median(x), 3)))
    }
    position <- position_dodge(width)
    l1 <- layer(data = data, mapping = mymap, stat = StatBoxplot,
            geom = "errorbar", position = position, show.legend = show.legend,
            inherit.aes = inherit.aes, params = list(na.rm = na.rm,
                coef = coef, width = width, ...))
    l2 <- layer(data = data, mapping = mymap, stat = stat, geom = GeomBoxplot,
            position = position, show.legend = show.legend, inherit.aes = inherit.aes,
            params = list(outlier.colour = outlier.colour, outlier.shape = outlier.shape,
                outlier.size = outlier.size, outlier.stroke = outlier.stroke,
                notch = notch, notchwidth = notchwidth, varwidth = varwidth,
                na.rm = na.rm, ...))
    l3 <- layer(data = data, mapping = mymap, stat = StatSummary,
            geom = "label", position = position, show.legend = show.legend,
            inherit.aes = inherit.aes, params = list(fun.data = fun_med,
                fun.y = fun.y, fun.ymax = fun.ymax, fun.ymin = fun.ymin,
                fun.args = fun.args, na.rm=na.rm,family=font,size=fsize/3,vjust=-0.1,...))
    return(list(l1,l2,l3))
}

which allows you to create your customized boxplots it now like this:

ggplot(mpg) +
  geom_myboxplot( hwy ~ class, font = "sans",fsize = 18)+
  theme_grey(base_family = "sans",base_size = 18 )

And they look like this:

Note: we did not actually have to use the layer function, we could have used the orginal stat_boxplot, geom_boxplot, and stat_summary calls in their place. But we still would have had to fill in all the parameters if we wanted to be able to control them from our custom boxplot, so I think it was clearer this way - at least from the point-of-view of structure as opposed to functionality. Maybe it isn't though, it is a matter of taste...

Also I don't have that font which does look a lot nicer. But I did not feel like tracking it down and installing it.

Question:

In order to plot half densities, I am using the function described in this post: Split violin plot with ggplot2

However, when I want to draw the quantiles on the densities, like on a normal geom_violin() or geom_boxplot(), I obtain an error message.

I would also be interested in adding the number of observations above each half density.

Here is an example of what I would like to obtain:

data("diamonds")
library(ggplot2)

# Function described in a previous post
GeomSplitViolin <- ggproto("GeomSplitViolin", GeomViolin, draw_group = function(self, data, ..., draw_quantiles = NULL){
  data <- transform(data, xminv = x - violinwidth * (x - xmin), xmaxv = x + violinwidth * (xmax - x))
  grp <- data[1,'group']
  newdata <- plyr::arrange(transform(data, x = if(grp%%2==1) xminv else xmaxv), if(grp%%2==1) y else -y)
  newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ], newdata[1, ])
  newdata[c(1,nrow(newdata)-1,nrow(newdata)), 'x'] <- round(newdata[1, 'x']) 
  if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
    stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 
                                              1))
    quantiles <- create_quantile_segment_frame(data, draw_quantiles)
    aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), c("x", "y")), drop = FALSE]
    aesthetics$alpha <- rep(1, nrow(quantiles))
    both <- cbind(quantiles, aesthetics)
    quantile_grob <- GeomPath$draw_panel(both, ...)
    ggplot2:::ggname("geom_split_violin", grobTree(GeomPolygon$draw_panel(newdata, ...), quantile_grob))
  }
  else {
    ggplot2:::ggname("geom_split_violin", GeomPolygon$draw_panel(newdata, ...))
  }
})

geom_split_violin <- function (mapping = NULL, data = NULL, stat = "ydensity", position = "identity", ..., draw_quantiles = NULL, trim = TRUE, scale = "area", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) {
  layer(data = data, mapping = mapping, stat = stat, geom = GeomSplitViolin, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles, na.rm = na.rm, ...))
}

tmp <- diamonds[which(diamonds$cut %in% c("Fair", "Good")), ]

# Obtained plot
ggplot(tmp, aes(as.factor(color), carat, fill = cut)) +
  geom_split_violin()

# Error due to internal functions (interleave, ...)
ggplot(tmp, aes(as.factor(color), carat, fill = cut)) +
  geom_split_violin(draw_quantiles = 0.5)

# Function to return number of observation
give_n = function(x, y_up = y_upper) {
  data.frame(y = y_up * 1.06,
             label = paste("n =", length(x))
  )
}

# Code to add number of observations above each half density
new_plot = given_plot +
  # Give back only length of data
  stat_summary(fun.data = give_n, aes(x = as.factor(variable)), geom = "text")

Answer:

We can make further adjustments to the function by @YAK, and add some adjustments to create_quantile_segment_frame:

GeomSplitViolin <- ggproto("GeomSplitViolin", GeomViolin,
  draw_group = function(self, data, ..., draw_quantiles = NULL) {
    # Original function by Jan Gleixner (@jan-glx)
    # Adjustments by Wouter van der Bijl (@Axeman)
    data <- transform(data, xminv = x - violinwidth * (x - xmin), xmaxv = x + violinwidth * (xmax - x))
    grp <- data[1, "group"]
    newdata <- plyr::arrange(transform(data, x = if (grp %% 2 == 1) xminv else xmaxv), if (grp %% 2 == 1) y else -y)
    newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ], newdata[1, ])
    newdata[c(1, nrow(newdata) - 1, nrow(newdata)), "x"] <- round(newdata[1, "x"])
    if (length(draw_quantiles) > 0 & !scales::zero_range(range(data$y))) {
      stopifnot(all(draw_quantiles >= 0), all(draw_quantiles <= 1))
      quantiles <- create_quantile_segment_frame(data, draw_quantiles, split = TRUE, grp = grp)
      aesthetics <- data[rep(1, nrow(quantiles)), setdiff(names(data), c("x", "y")), drop = FALSE]
      aesthetics$alpha <- rep(1, nrow(quantiles))
      both <- cbind(quantiles, aesthetics)
      quantile_grob <- GeomPath$draw_panel(both, ...)
      ggplot2:::ggname("geom_split_violin", grid::grobTree(GeomPolygon$draw_panel(newdata, ...), quantile_grob))
    }
    else {
      ggplot2:::ggname("geom_split_violin", GeomPolygon$draw_panel(newdata, ...))
    }
  }
)

create_quantile_segment_frame <- function(data, draw_quantiles, split = FALSE, grp = NULL) {
  dens <- cumsum(data$density) / sum(data$density)
  ecdf <- stats::approxfun(dens, data$y)
  ys <- ecdf(draw_quantiles)
  violin.xminvs <- (stats::approxfun(data$y, data$xminv))(ys)
  violin.xmaxvs <- (stats::approxfun(data$y, data$xmaxv))(ys)
  violin.xs <- (stats::approxfun(data$y, data$x))(ys)
  if (grp %% 2 == 0) {
    data.frame(
      x = ggplot2:::interleave(violin.xs, violin.xmaxvs),
      y = rep(ys, each = 2), group = rep(ys, each = 2)
    )
  } else {
    data.frame(
      x = ggplot2:::interleave(violin.xminvs, violin.xs),
      y = rep(ys, each = 2), group = rep(ys, each = 2)
    )
  }
}

geom_split_violin <- function(mapping = NULL, data = NULL, stat = "ydensity", position = "identity", ..., 
                              draw_quantiles = NULL, trim = TRUE, scale = "area", na.rm = FALSE, 
                              show.legend = NA, inherit.aes = TRUE) {
  layer(data = data, mapping = mapping, stat = stat, geom = GeomSplitViolin, position = position, 
        show.legend = show.legend, inherit.aes = inherit.aes, 
        params = list(trim = trim, scale = scale, draw_quantiles = draw_quantiles, na.rm = na.rm, ...))
}

Then simply plot:

ggplot(diamonds[which(diamonds$cut %in% c("Fair", "Good")), ],
       aes(as.factor(color), carat, fill = cut)) +
  geom_split_violin(draw_quantiles = c(0.25, 0.5, 0.75))

Question:

I am following up a discussion started at: How can I make geom_area() leave a gap for missing values?. It seems like geom_ribbon is not longer leaving gaps for missing values. Please try to execute the reproducible example in the attached link. I cannot get the answer described. Can you?


Answer:

Looks like a bug in ggplot2, there seems to be a missing handle_na function that needs to be added as a part of a new unified way of dealing with NA values.

Update:

The first post here refined an entire new ggproto to fix this, but I realized that as a one-liner workaround you can just override the handle_na function like I do in the code below (# fix GeomRibbon):

require(dplyr)
require(ggplot2)
require(grid)

set.seed(1)

test <- data.frame(x = rep(1:10, 3), y = abs(rnorm(30)), z = rep(LETTERS[1:3], 10)) 
           %>% arrange(x, z)

test[test$x == 4, "y"] <- NA

test$ymax <- test$y
test$ymin <- 0
zl <- levels(test$z)
for (i in 2:length(zl)) {
    zi <- test$z == zl[i]
    zi_1 <- test$z == zl[i - 1]
    test$ymin[zi] <- test$ymax[zi_1]
    test$ymax[zi] <- test$ymin[zi] + test$ymax[zi]
}


# fix GeomRibbon
GeomRibbon$handle_na <- function(data, params) {  data }

ggplot(test, aes(x = x,y=y, ymax = ymax, ymin = ymin, fill = z)) +
  geom_ribbon() +
  scale_x_continuous(breaks = 1:10)

yielding:

Question:

I'm trying to write a custom stat_* for ggplot2, where I would like to color a 2D loess surface using tiles. When I start from the extension guide, I can write a stat_chull like they do:

stat_chull = function(mapping = NULL, data = NULL, geom = "polygon",
                       position = "identity", na.rm = FALSE, show.legend = NA, 
                       inherit.aes = TRUE, ...) {

  chull = ggproto("chull", Stat,
    compute_group = function(data, scales) {
      data[chull(data$x, data$y), , drop = FALSE]
    },
    required_aes = c("x", "y")
  )

  layer(
    stat = chull, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

This work for both the simple call and facet wrapping:

ggplot(mpg, aes(x=displ, y=hwy)) + 
  geom_point() + 
  stat_chull()
# optionally + facet_wrap(~ class)

When I write my stat_loess2d, I can also visualize all classes or an individual class:

stat_loess2d = function(mapping = NULL, data = NULL, geom = "tile",
                       position = "identity", na.rm = FALSE, show.legend = NA, 
                       inherit.aes = TRUE, ...) {

  loess2d = ggproto("loess2d", Stat,
    compute_group = function(data, scales) {
      dens = MASS::kde2d(data$x, data$y)
      lsurf = loess(fill ~ x + y, data=data)
      df = data.frame(x = rep(dens$x, length(dens$y)),
                      y = rep(dens$y, each=length(dens$x)),
                      dens = c(dens$z))
      df$fill = predict(lsurf, newdata=df[c("x", "y")])
      df
    },
    required_aes = c("x", "y", "fill")
  )

  layer(
    stat = loess2d, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

ggplot(mpg, aes(x=displ, y=hwy, fill=year)) + 
  geom_point(aes(color=year)) + 
  stat_loess2d()

ggplot(mpg[mpg$class == "compact",], aes(x=displ, y=hwy, fill=year)) + 
  geom_point(aes(color=year)) + 
  stat_loess2d()

However, when I try to facet the above, tiles are no longer shown:

ggplot(mpg, aes(x=displ, y=hwy, fill=year)) + 
  geom_point(aes(color=year)) + 
  stat_loess2d() +
  facet_wrap(~ class)

Can someone tell me what I'm doing wrong here?


Answer:

Explanation

The main issue I see here actually lies beyond what you have done, & is related to how geom_tile handles tile creation across different facets, when the specific x / y axis values differ significantly. An older question demonstrated a similar phenomenon: geom_tile works fine with each facet's data on its own, but put them together, and the tiles shrink to match the smallest difference between different facets' values. This leaves vast amounts of white space in the plot layer, and usually gets progressively worse with every additional facet, until the tiles themselves become practically invisible.

To get around this, I would add a data processing step after the density / loess calculations for each facet, to standardize the range of x and y values across all facets.

Some elaboration in case you are not very familiar with the relationship between compute_layer, compute_panel, and compute_group (I certainly wasn't when I started messing around with ggproto objects...):

  • Essentially, all Stat* objects have these three functions to bridge the gap between a given dataframe (mpg in this case), and what's received by the Geom* side of things.

  • Of the three, compute_layer is the top-level function, and typically triggers compute_panel to calculate a separate dataframe for each facet / panel (the terminology used in exported functions is facet, but the underlying package code refers to the same as panel; I'm not sure why either). In turn, compute_panel triggers compute_group to calculate a separate dataframe for each group (as defined by the group / colour / fill / etc. aesthetic parameters).

  • The results from compute_group are returned to compute_panel and combined into one dataframe. Likewise, compute_layer receives one dataframe from each facet's compute_panel, and combines them together again. The combined dataframe is then passed on to Geom* to draw.

(Above is the generic set-up as defined in the top-level Stat. Other Stat* objects inheriting from Stat may override the behaviour in any of the steps. For example, StatIdentity's compute_layer returns the original dataframe as-is, without triggering compute_panel / compute_group at all, because there is no need to do so for unchanged data.)

For this use case, we can modify the code in compute_layer, after the results have been returned from compute_panel / compute_group and combined together, to interpolate values associated with each facet into common bins. Because common bins = nice large tiles without white space in between.

Modification

Here's how I would have written the loess2d ggproto object, with an additional definition for compute_layer:

loess2d = ggproto("loess2d", Stat,
                  compute_group = function(data, scales) {
                    dens = MASS::kde2d(data$x, data$y)
                    lsurf = loess(fill ~ x + y, data=data)
                    df = data.frame(x = rep(dens$x, length(dens$y)),
                                    y = rep(dens$y, each=length(dens$x)),
                                    dens = c(dens$z))
                    df$fill = predict(lsurf, newdata=df[c("x", "y")])
                    df
                  },
                  compute_layer = function(self, data, params, layout) {
                    # no change from Stat$compute_layer in this chunk, except
                    # for liberal usage of `ggplot2:::` to utilise un-exported
                    # functions from the package
                    ggplot2:::check_required_aesthetics(self$required_aes, 
                                                        c(names(data), names(params)), 
                                                        ggplot2:::snake_class(self))
                    data <- remove_missing(data, params$na.rm, 
                                           c(self$required_aes, self$non_missing_aes), 
                                           ggplot2:::snake_class(self),
                                           finite = TRUE)
                    params <- params[intersect(names(params), self$parameters())]
                    args <- c(list(data = quote(data), scales = quote(scales)), params)
                    df <- plyr::ddply(data, "PANEL", function(data) {
                      scales <- layout$get_scales(data$PANEL[1])
                      tryCatch(do.call(self$compute_panel, args), 
                               error = function(e) {
                                 warning("Computation failed in `", ggplot2:::snake_class(self), 
                                         "()`:\n", e$message, call. = FALSE)
                                 data.frame()
                               })
                    })

                    # define common x/y grid range across all panels
                    # (length = 25 chosen to match the default value for n in MASS::kde2d)
                    x.range <- seq(min(df$x), max(df$x), length = 25)
                    y.range <- seq(min(df$y), max(df$y), length = 25)

                    # interpolate each panel's data to a common grid,
                    # with NA values for regions where each panel doesn't
                    # have data (this can be changed via the extrap
                    # parameter in akima::interp, but I think  
                    # extrapolating may create misleading visuals)
                    df <- df %>% 
                      tidyr::nest(-PANEL) %>%
                      mutate(data = purrr::map(data, 
                                               ~akima::interp(x = .x$x, y = .x$y, z = .x$fill,
                                                              xo = x.range, yo = y.range,
                                                              nx = 25, ny = 25) %>%
                                                 akima::interp2xyz(data.frame = TRUE) %>%
                                                 rename(fill = z))) %>%
                      tidyr::unnest()

                    return(df)
                  },
                  required_aes = c("x", "y", "fill")
)

Usage:

ggplot(mpg,
       aes(x=displ, y=hwy, fill=year)) + 
  stat_loess2d() +
  facet_wrap(~ class)
# this does trigger warnings (not errors) because some of the facets contain
# really very few observations. if we filter for facets with more rows of data
# in the original dataset, this wouldn't be an issue

ggplot(mpg %>% filter(!class %in% c("2seater", "minivan")),
       aes(x=displ, y=hwy, fill=year)) + 
  stat_loess2d() +
  facet_wrap(~ class)
# no warnings triggered

Question:

I am trying to create a new geometry for ggplot as described here, while adapting it to deal with Simple Features objects.

As an example, let's take the same exercise of plotting the convex hull of a set of points. Thus, I wrote a new geom_envelope() function borrowing elements from geom_sf() and a corresponding GeomEnvelope ggproto object that performs the computation overriding the draw_group() method (since I want a single polygon for the full set of points).

However, I must be missing something, since I can't get the polygon plotted. I've been trying for a while but I either get errors or nothing plotted.

library(sf); library(ggplot2); library(dplyr)

Npts <- 10
pts <- matrix(runif(2*Npts), ncol = 2) %>% 
  st_multipoint() %>% 
  st_sfc() %>% 
  st_cast("POINT") %>% 
  st_sf()

GeomEnvelope <- ggproto(
  "GeomEnvelope", GeomSf,

  required_aes = "geometry",

  default_aes = aes(
    shape = NULL,
    colour = "grey20",
    fill = "white",
    size = NULL,
    linetype = 1,
    alpha = 0.5,
    stroke = 0.5
  ),

  draw_key = draw_key_polygon,

  draw_group = function(data, panel_params, coord) {
    n <- nrow(data)
    if (n <= 2) return(grid::nullGrob())

    gp <- gpar(
      colour = data$colour,
      size = data$size,
      linetype = data$linetype,
      fill = alpha(data$fill, data$alpha),
      group = data$group,
      stringsAsFactors = FALSE
    )

    geometry <- sf::st_convex_hull(st_combine(sf::st_as_sf(data)))

    sf::st_as_grob(geometry, pch = data$shape, gp = gp)

  }
)


geom_envelope <- function(
  mapping = aes(),
  data = NULL,
  position = "identity",
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE,
  ...) {

  if (!is.null(data) && ggplot2:::is_sf(data)) {
    geometry_col <- attr(data, "sf_column")
  }
  else {
    geometry_col <- "geometry"
  }
  if (is.null(mapping$geometry)) {
    mapping$geometry <- as.name(geometry_col)
  }
  c(
    layer(
      geom = GeomEnvelope,
      data = data,
      mapping = mapping,
      stat = "identity",
      position = position,
      show.legend = if (is.character(show.legend))
        TRUE
      else
        show.legend,
      inherit.aes = inherit.aes,
      params = list(
        na.rm = na.rm,
        legend = if (is.character(show.legend))
          show.legend
        else
          "polygon",
        ...
      )
    ),
    coord_sf(default = TRUE)
  )
}

ggplot(pts) + geom_sf() + geom_envelope() + theme_bw()

Created on 2019-04-23 by the reprex package (v0.2.1)


Answer:

If this is your actual use case (rather than a simplified example of it), then I'd say the essential part of what you are looking for is a custom Stat, not a custom Geom. Data computations / manipulations should happen within the former.

(For reference, I usually look at the code in GeomBoxplot / StatBoxplot to figure out where things should happen, since that use case includes a bunch of calculations for quantiles / outliers, as well as the combination of different grob elements that accept various aesthetic mappings.)

Data with random seed for reproducibility:

set.seed(123)

pts <- matrix(runif(2*Npts), ncol = 2) %>% 
  st_multipoint() %>% 
  st_sfc() %>% 
  st_cast("POINT") %>% 
  st_sf()
Basic demonstration

The following StatEnvelope would take the dataset passed to the relevant geom layer, and convert the collection of geometry values within each group (if no grouping aesthetic is specified, the entire dataset will be treated as one group) into a convex hull:

StatEnvelope <- ggproto(
  "StatEnvelope", Stat,
  required_aes = "geometry",
  compute_group = function(data, scales) {
    if(nrow(data) <= 2) return (NULL)
    data %>%
      group_by_at(vars(-geometry)) %>%
      summarise(geometry = sf::st_convex_hull(sf::st_combine(geometry))) %>%
      ungroup()
  }
)

ggplot(pts) + 
  geom_sf() +
  geom_sf(stat = StatEnvelope, 
          alpha = 0.5, color = "grey20", fill = "white", size = 0.5) +
  theme_bw()

Upgrade

The above approach, using the existing geom_sf, does a perfectly passable job at creating the envelope. If we want to specify some default aesthetic parameters, rather than repeat within every instance of geom_sf, we still don't need to define a new Geom. A function that modifies the existing geom_sf would do fine.

geom_envelope <- function(...){
  suppressWarnings(geom_sf(stat = StatEnvelope, 
                           ..., # any aesthetic argument specified in the function 
                                # will take precedence over the default arguments
                                # below, with suppressWarning to mute warnings on
                                # any duplicated aesthetics
                           alpha = 0.5, color = "grey20", fill = "white", size = 0.5))
}

# outputs same plot as before
ggplot(pts) + 
  geom_sf() +
  geom_envelope() +
  theme_bw()

# with different aesthetic specifications for demonstration
ggplot(pts) + 
  geom_sf() +
  geom_envelope(alpha = 0.1, colour = "brown", fill = "yellow", size = 3) +
  theme_bw()


Explanation for what's going on for the code posted in the question

When I mess around with customised ggproto objects, one useful trick I like to use is to insert print statements within every function I modify, e.g. "setting up parameters", or "drawing panel, step 3", etc. This allows me to have a good idea of what's happening beneath the hood, and track where things went wrong when the function (inevitably) returns an error on the 1st / 2nd / ... / nth try.

In this case, if we insert print("draw group") at the beginning of GeomEnvelope's draw_group function before running ggplot(pts) + geom_sf() + geom_envelope() + theme_bw(), we'll observe the absence of any printed message in the console. In other words, the draw_group function was never called, so any data manipulation defined therein has no effect on the output.

There are several draw_* functions in Geom*, which can be confusing when we want to make modifications. From the code for Geom, we can see the hierarchy is as follows:

  1. draw_layer (which includes a do.call(self$draw_panel, args) line)
  2. draw_panel (which includes a self$draw_group(group, panel_params, coord, ...) line)
  3. draw_group (which is not implemented for Geom).

So draw_layer triggers draw_panel, and draw_panel triggers draw_group. (Mirroring this, in Stat, compute_layer triggers compute_panel and compute_panel triggers compute_group.)

GeomSf, which inherits from Geom (code here), overrides Geom's draw_panel function with a chunk of code that returns a sf_grob(...), and DOES NOT trigger draw_group.

Consequently, when GeomEnvelope inherits GeomSf's draw_panel function, nothing in its draw_group function is going to matter. What's drawn in the plot depends on draw_panel, and the geom_envelope layer in the question performs essentially the same task as geom_sf, plotting each individual point separately. If you remove / comment out the geom_sf layer, you'll see the same points; just with color = "grey20", alpha = 0.5, etc., as specified in GeomSf's default_aes.

(Note: fill = "white" isn't in use, because geom_sf defaults to GeomPoint's default aesthetics for point data, which means it inherits GeomPoint's pch = 19 for its point shape, and plots a solid circle unaffected by any fill value.)