Hot questions for Using Ggplot2 in ggrepel

Question:

I have a dataset, where each data point has an x-value that is constrained (represents an actual instance of a quantitative variable), y-value that is arbitrary (exists simply to provide a dimension to spread out text), and a label. My datasets can be very large, and there is often text overlap, even when I try to spread the data across the y-axis as much as possible.

Hence, I am trying to use the new ggrepel. However, I am trying to keep the text labels constrained at their x-value position, while only allowing them to repel from each other in the y-direction.

As an example, the below code produces an plot for 32 data points, where the x-values show the number of cylinders in a car, and the y-values are determined randomly (have no meaning but to provide a second dimension for text plotting purposes). Without using ggrepel, there is significant overlap in the text:

library(ggrepel)
library(ggplot2)
set.seed(1)
data = data.frame(x=runif(100, 1, 10),y=runif(100, 1, 10),label=paste0("label",seq(1:100)))
origPlot <- ggplot(data) +
  geom_point(aes(x, y), color = 'red') +
  geom_text(aes(x, y, label = label)) +
  theme_classic(base_size = 16)

I can remedy the text overlap using ggrepel, as shown below. However, this changes not only the y-values, but also the x-values. I am trying to avoid changing the x-values, as they represent an actual physical meaning (the number of cylinders):

repelPlot <- ggplot(data) +
  geom_point(aes(x, y), color = 'red') +
  geom_text_repel(aes(x, y, label = label)) +
  theme_classic(base_size = 16)

As a note, the reason I cannot allow the x-value of the text to change is because I am only plotting the text (not the points). Whereas, it seems that most examples in ggrepel keep the position of the points (so that their values remain true), and only repel the x and y values of the labels. Then, the points and connected to the labels with segments (you can see that in my second plot example).

I kept the points in the two examples above for demonstration purposes. However, I am only retaining the text (and hence will be removing the points and the segments), leaving me with something like this:

repelPlot2 <- ggplot(data) + geom_text_repel(aes(x, y, label = label), segment.size = 0) + theme_classic(base_size = 16)

My question is two fold:

1) Is it possible for me to repel the text labels only in the y-direction?

2) Is it possible for me to obtain a structure containing the new (repelled) y-values of the text?

Thank you for any advice!


Answer:

I don't think it is possible to repel text labels only in one direction with ggrepel.

I would approach this problem differently, by instead generating the arbitrary y-axis positions manually. For example, for the data set in your example, you could do this using the code below.

I have used the dplyr package to group the data set by the values of x, and then created a new column of data y containing the row numbers within each group. The row numbers are then used as the values for the y-axis.

library(ggplot2)
library(dplyr)

data <- data.frame(x = mtcars$cyl, label = paste0("label", seq(1:32)))

data <- data %>% 
  group_by(x) %>% 
  mutate(y = row_number())

ggplot(data, aes(x = x, y = y, label = label)) + 
  geom_text(size = 2) + 
  xlim(3.5, 8.5) + 
  theme_classic(base_size = 8)

ggsave("filename.png", width = 4, height = 2)

Question:

Is there any way to add a label on or near the center of a geom_curve line? Currently, I can only do so by labeling either the start or end point of the curve.

library(tidyverse)
library(ggrepel)

df <- data.frame(x1 = 1, y1 = 1, x2 = 2, y2 = 3, details = "Object Name")

ggplot(df, aes(x = x1, y = y1, label = details)) +
  geom_point(size = 4) +
  geom_point(aes(x = x2, y = y2),
             pch = 17, size = 4) +
  geom_curve(aes(x = x1, y = y1, xend = x2, yend = y2)) +
  geom_label(nudge_y = 0.05) +
  geom_label_repel(box.padding = 2)

I would love some way to automatically label the curve near coordinates x=1.75, y=1.5. Is there a solution out there I haven't seen yet? My intended graph is quite busy, and labeling the origin points makes it harder to see what's happening, while labeling the arcs would make a cleaner output.


Answer:

I've come to a solution for this problem. It's large and clunky, but effective.

The core problem is that geom_curve() does not draw a set path, but it moves and scales with the aspect ratio of the plot window. So short of locking the aspect ratio with coord_fixed(ratio=1) there is no way I can easily find to predict where the midpoint of a geom_curve() segment will be.

So instead I set about finding midpoint for a curve, and then forcing the curve to go through that point which I would later label. To find the midpoint I had to copy two functions from the grid package:

library(grid)
library(tidyverse)
library(ggrepel)

# Find origin of rotation
# Rotate around that origin
calcControlPoints <- function(x1, y1, x2, y2, curvature, angle, ncp,
                              debug=FALSE) {
  # Negative curvature means curve to the left
  # Positive curvature means curve to the right
  # Special case curvature = 0 (straight line) has been handled
  xm <- (x1 + x2)/2
  ym <- (y1 + y2)/2
  dx <- x2 - x1
  dy <- y2 - y1
  slope <- dy/dx

  # Calculate "corner" of region to produce control points in
  # (depends on 'angle', which MUST lie between 0 and 180)
  # Find by rotating start point by angle around mid point
  if (is.null(angle)) {
    # Calculate angle automatically
    angle <- ifelse(slope < 0,
                    2*atan(abs(slope)),
                    2*atan(1/slope))
  } else {
    angle <- angle/180*pi
  }
  sina <- sin(angle)
  cosa <- cos(angle)
  # FIXME:  special case of vertical or horizontal line ?
  cornerx <- xm + (x1 - xm)*cosa - (y1 - ym)*sina
  cornery <- ym + (y1 - ym)*cosa + (x1 - xm)*sina

  # Debugging
  if (debug) {
    grid.points(cornerx, cornery, default.units="inches",
                pch=16, size=unit(3, "mm"),
                gp=gpar(col="grey"))
  }

  # Calculate angle to rotate region by to align it with x/y axes
  beta <- -atan((cornery - y1)/(cornerx - x1))
  sinb <- sin(beta)
  cosb <- cos(beta)
  # Rotate end point about start point to align region with x/y axes
  newx2 <- x1 + dx*cosb - dy*sinb
  newy2 <- y1 + dy*cosb + dx*sinb

  # Calculate x-scale factor to make region "square"
  # FIXME:  special case of vertical or horizontal line ?
  scalex <- (newy2 - y1)/(newx2 - x1)
  # Scale end points to make region "square"
  newx1 <- x1*scalex
  newx2 <- newx2*scalex

  # Calculate the origin in the "square" region
  # (for rotating start point to produce control points)
  # (depends on 'curvature')
  # 'origin' calculated from 'curvature'
  ratio <- 2*(sin(atan(curvature))^2)
  origin <- curvature - curvature/ratio
  # 'hand' also calculated from 'curvature'
  if (curvature > 0)
    hand <- "right"
  else
    hand <- "left"
  oxy <- calcOrigin(newx1, y1, newx2, newy2, origin, hand)
  ox <- oxy$x
  oy <- oxy$y

  # Calculate control points
  # Direction of rotation depends on 'hand'
  dir <- switch(hand,
                left=-1,
                right=1)
  # Angle of rotation depends on location of origin
  maxtheta <- pi + sign(origin*dir)*2*atan(abs(origin))
  theta <- seq(0, dir*maxtheta,
               dir*maxtheta/(ncp + 1))[c(-1, -(ncp + 2))]
  costheta <- cos(theta)
  sintheta <- sin(theta)
  # May have BOTH multiple end points AND multiple
  # control points to generate (per set of end points)
  # Generate consecutive sets of control points by performing
  # matrix multiplication
  cpx <- ox + ((newx1 - ox) %*% t(costheta)) -
    ((y1 - oy) %*% t(sintheta))
  cpy <- oy + ((y1 - oy) %*% t(costheta)) +
    ((newx1 - ox) %*% t(sintheta))

  # Reverse transformations (scaling and rotation) to
  # produce control points in the original space
  cpx <- cpx/scalex
  sinnb <- sin(-beta)
  cosnb <- cos(-beta)
  finalcpx <- x1 + (cpx - x1)*cosnb - (cpy - y1)*sinnb
  finalcpy <- y1 + (cpy - y1)*cosnb + (cpx - x1)*sinnb

  # Debugging
  if (debug) {
    ox <- ox/scalex
    fox <- x1 + (ox - x1)*cosnb - (oy - y1)*sinnb
    foy <- y1 + (oy - y1)*cosnb + (ox - x1)*sinnb
    grid.points(fox, foy, default.units="inches",
                pch=16, size=unit(1, "mm"),
                gp=gpar(col="grey"))
    grid.circle(fox, foy, sqrt((ox - x1)^2 + (oy - y1)^2),
                default.units="inches",
                gp=gpar(col="grey"))
  }

  list(x=as.numeric(t(finalcpx)), y=as.numeric(t(finalcpy)))
}

calcOrigin <- function(x1, y1, x2, y2, origin, hand) {
  # Positive origin means origin to the "right"
  # Negative origin means origin to the "left"
  xm <- (x1 + x2)/2
  ym <- (y1 + y2)/2
  dx <- x2 - x1
  dy <- y2 - y1
  slope <- dy/dx
  oslope <- -1/slope
  # The origin is a point somewhere along the line between
  # the end points, rotated by 90 (or -90) degrees
  # Two special cases:
  # If slope is non-finite then the end points lie on a vertical line, so
  # the origin lies along a horizontal line (oslope = 0)
  # If oslope is non-finite then the end points lie on a horizontal line,
  # so the origin lies along a vertical line (oslope = Inf)
  tmpox <- ifelse(!is.finite(slope),
                  xm,
                  ifelse(!is.finite(oslope),
                         xm + origin*(x2 - x1)/2,
                         xm + origin*(x2 - x1)/2))
  tmpoy <- ifelse(!is.finite(slope),
                  ym + origin*(y2 - y1)/2,
                  ifelse(!is.finite(oslope),
                         ym,
                         ym + origin*(y2 - y1)/2))
  # ALWAYS rotate by -90 about midpoint between end points
  # Actually no need for "hand" because "origin" also
  # encodes direction
  # sintheta <- switch(hand, left=-1, right=1)
  sintheta <- -1
  ox <- xm - (tmpoy - ym)*sintheta
  oy <- ym + (tmpox - xm)*sintheta

  list(x=ox, y=oy)
}

With that in place, I calculated a midpoint for each record

df <- data.frame(x1 = 1, y1 = 1, x2 = 10, y2 = 10, details = "Object Name")

df_mid <- df %>% 
  mutate(midx = calcControlPoints(x1, y1, x2, y2, 
                                  angle = 130, 
                                  curvature = 0.5, 
                                  ncp = 1)$x) %>% 
  mutate(midy = calcControlPoints(x1, y1, x2, y2, 
                                  angle = 130, 
                                  curvature = 0.5, 
                                  ncp = 1)$y)

I then make the graph, but draw two separate curves. One from the origin to the calculated midpoint, and another from the midpoint to the destination. The angle and curvature settings for both finding the midpoint and drawing these curves are tricky to keep the result from obviously looking like two different curves.

ggplot(df_mid, aes(x = x1, y = y1)) +
  geom_point(size = 4) +
  geom_point(aes(x = x2, y = y2),
             pch = 17, size = 4) +
  geom_curve(aes(x = x1, y = y1, xend = midx, yend = midy),
             curvature = 0.25, angle = 135) +
  geom_curve(aes(x = midx, y = midy, xend = x2, yend = y2),
             curvature = 0.25, angle = 45) +
  geom_label_repel(aes(x = midx, y = midy, label = details),
                   box.padding = 4,
                   nudge_x = 0.5,
                   nudge_y = -2)

Though the answer isn't ideal or elegant, it scales with a large number of records.

Question:

I am drawing heatmap with ggplot2. Several ticks on y axis need to be labeled. However,some of them are too close and overlap. I know ggrepel could separate text labels, but currently I have not worked out for my problem.

My code is as following. Any suggestion is welcome. Thanks.

Code:

df <- data.frame()

for (i in 1:50){
  tmp_df <- data.frame(cell=paste0("cell", i), 
                       gene=paste0("gene", 1:100), exp = rnorm(100), ident = i %% 5)
  df<-rbind(df, tmp_df)
}

labelRow=rep("", 100)
for (i in c(2, 5, 7, 11, 19, 23)){
  labelRow[i] <- paste0("gene", i)
}

library(ggplot2)
heatmap <- ggplot(data = df, mapping = aes(x = cell, y = gene, fill = exp)) +
  geom_tile() + 
  scale_fill_gradient2(name = "Expression") + 
  scale_y_discrete(position = "right", labels = labelRow) +
  facet_grid(facets = ~ident,
             drop = TRUE,
             space = "free",
             scales = "free", switch = "x") +
  scale_x_discrete(expand = c(0, 0), drop = TRUE) +
  theme(axis.line = element_blank(),
        axis.ticks = element_blank(),
        axis.title.y = element_blank(),
        axis.text.y = element_text(),
        axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        strip.text.x = element_text(angle = -90))

heatmap


Answer:

For these kinds of problems, I prefer to draw the axis as a separate plot and then combine. It takes a bit of fiddling but allows you to draw pretty much any axis you want.

In my solution, I'm using the functions get_legend(), align_plots(), and plot_grid() from the cowplot package. Disclaimer: I'm the package author.

library(ggplot2)
library(cowplot); theme_set(theme_gray()) # undo cowplot theme setting
library(ggrepel)

df<-data.frame()
for (i in 1:50){
  tmp_df <- data.frame(cell=paste0("cell", i), 
                       gene=paste0("gene", 1:100), exp=rnorm(100), ident=i%%5)
  df<-rbind(df, tmp_df)
}


labelRow <- rep("", 100)
genes <- c(2, 5, 7, 11, 19, 23)
labelRow[genes] <- paste0("gene ", genes)

# make the heatmap plot
heatmap <- ggplot(data = df, mapping = aes(x = cell,y = gene, fill = exp)) +
  geom_tile() + 
  scale_fill_gradient2(name = "Expression") + 
  scale_x_discrete(expand = c(0, 0), drop = TRUE) + 
  facet_grid(facets = ~ident,
             drop = TRUE,
             space = "free",
             scales = "free", switch = "x") + 
  theme(axis.line = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        strip.text.x = element_text(angle = -90),
        legend.justification = "left",
        plot.margin = margin(5.5, 0, 5.5, 5.5, "pt"))

# make the axis plot
axis <- ggplot(data.frame(y = 1:100,
                          gene = labelRow),
               aes(x = 0, y = y, label = gene)) +
  geom_text_repel(min.segment.length = grid::unit(0, "pt"),
                 color = "grey30",  ## ggplot2 theme_grey() axis text
                 size = 0.8*11/.pt  ## ggplot2 theme_grey() axis text
                 ) +
  scale_x_continuous(limits = c(0, 1), expand = c(0, 0),
                     breaks = NULL, labels = NULL, name = NULL) +
  scale_y_continuous(limits = c(0.5, 100.5), expand = c(0, 0),
                     breaks = NULL, labels = NULL, name = NULL) +
  theme(panel.background = element_blank(),
        plot.margin = margin(0, 0, 0, 0, "pt"))

# align and combine
aligned <- align_plots(heatmap + theme(legend.position = "none"), axis, align = "h", axis = "tb")
aligned <- append(aligned, list(get_legend(heatmap)))
plot_grid(plotlist = aligned, nrow = 1, rel_widths = c(5, .5, .7))

Question:

I made this Volcano plot and am hoping to improve it as follows:

  1. fully shade the region with blue data points: with my current code, I wasn't able to extend the shade beyond what you see. I would like it to go all the way to the plot area limits.

  2. geom_text allowed me to label a subset of data points, but doing it with ggrepel should add lines connecting the data points with labels thus improving labeling clarity. How can I reuse the existing geom_text code in ggrepel to achieve this?

Here is my code:

ggplot(vol.new, aes(x = log2.fold.change, y = X.NAME., fill = Color)) + # Define data frame to be used for plotting; define data for x and y axes; crate a scatterplot object.

  geom_point(size = 2, shape = 21, colour = "black") + # Define data point style.

  ggtitle(main.title, subtitle = "Just a little subtitle") + # Define title and subtitle.

  labs(x = x.lab, y = y.lab) + # Define labels for x and y axes.

  scale_x_continuous(limits = c(-3, 3), breaks = seq(-3, 3, by = 0.5)) + # Define x limits, add ticks.
  scale_y_continuous(limits = c(0, 6), breaks = seq(0, 6, by = 1)) + # Define y limits, add ticks.

  theme(
    plot.title = element_text(family = "Arial", size = 11, hjust = 0), # Title size and font.
    plot.subtitle = element_text(family = "Arial", size = 11), # Subtitle size and font.
    axis.text = element_text(family = "Arial", size = 10), # Size and font of x and y values.
    axis.title = element_text(family = "Arial", size = 10), # Size and font of x and y axes.
    panel.border = element_rect(colour = "black", fill = NA, size = 1), # Black border around the plot area.
    axis.ticks = element_line(colour = "black", size = 1), # Style of x and y ticks.
    legend.position = "none"
  ) + # Remove legend.

  geom_hline(yintercept = 1.30103, colour = "black", linetype = "dashed", size = 0.75) + # Horizontal significance cut-off line.
  geom_vline(xintercept = 0.584963, colour = "black", linetype = "dashed", size = 0.75) + # Vertical significance cut-off line (+).
  # geom_vline (xintercept = -0.584963, colour = "black", linetype = "dashed", size = 0.75) #Vertical significance cut-off line (-)

  scale_fill_manual(breaks = c("blue", "red"), values = c("deepskyblue3", "firebrick1")) + # Costum colors of data points based on "PursFur" column.

  geom_text(aes(label = ifelse(PursFur == 1, as.character(Protein.ID), "")), hjust = 0, vjust = -0.25) + # Add identifiers to a subset of data points.

  annotate("text", x = 2.9, y = 1.45, label = "P = 0.05", size = 4, fontface = "bold") + # Label to horizontal cut-off line.
  annotate("text", x = 0.68, y = 5.9, label = "1.5-fold", size = 4, fontface = "bold", srt = 90) + # Label to vertical cut-off line.
  annotate("rect", xmin = 0.584963, xmax = 3, ymin = 1.30103, ymax = 6, alpha = .2) # Shade plot subregion.

Answer:

As suggested in the comments by @hrbrmstr and @zx8754, here are the modifications I made to the code above.

To solve the shading problem (via @hrbrmstr):

annotate ("rect", xmin = 0.584963, xmax = Inf, ymin = 1.30103, ymax = Inf, alpha = .2)

To solve the labeling question (via @zx8754):

geom_label_repel (aes (label = ifelse (PursFur == 1, as.character (Protein.ID), '')), nudge_x = 1.3, direction = "x")

And here is the outcome after these two changes:

See this website and this nice ggrepel tutorial to dive further into the second part of my initial question.

Question:

Is it possible to pass partially italicized text labels into ggplot? I have tried using the expression and italic commands (expression(paste(italic("some text")))), but these cannot be passed into a data frame because the result of the commands is not atomic. Setting the parameter fontface = "italic" also doesn't suffice, since this italicizes the entire label, rather than just a select set of characters in the label. For instance, I would like some necessarily italicized Latin phrases to be italicized in a label (such as "in vivo" in "in vivo point").

library(ggplot)
library(ggrepel)

df <- data.frame(V1 = c(1,2), V2 = c(2,4), V3 = c("in vivo point","another point"))

ggplot(data = df, aes(x = V1, y = V2)) + geom_point() + geom_text_repel(aes(label = V3))

Answer:

You can use parse = TRUE to pass ?plotmath expressions (as strings) to geom_text or geom_text_repel. You'll have to rewrite the strings as plotmath, but if it's not too many it's not too bad.

Note: The CRAN version of ggrepel has a bug that breaks parse = TRUE, but it has been fixed on the GitHub version. ggplot2::geom_text works fine.

# devtools::install_github('slowkow/ggrepel')

df <- data.frame(V1 = c(1,2), V2 = c(2,4), 
                 V3 = c("italic('in vivo')~point", "another~point"))

ggplot(data = df, aes(x = V1, y = V2, label = V3)) + 
    geom_point() + 
    geom_text_repel(parse = TRUE)

Question:

Is there somehow a trick to get the font within 'geom_label_repel' alpha=1 but the background maybe alpha=.2?

My problem is, that I have sometimes very dense plots. If I use just text, the text is not readable anymore. If I use label without transparency, the label is perfectly readable but I can not see behind the label. If I choose transparency for the label, then again, the font is no longer readable since it is also transparent and there is not enough contrast against the background.

What I would really love is a white shadow around the font :-)

Here is a minimal example do demonstrate the problem.

library(ggplot2)
library(ggrepel)
library(stringi)

set.seed(1)
df <- data.frame(x=rnorm(10000),
                 y=rnorm(10000),
                 label=NA)
df$label[1:26] <- stringi::stri_rand_strings(26,8)

ggplot(df, aes(x, y)) +
  geom_point(alpha=.3) +
  geom_label_repel(aes(label=label),
                   label.size = NA, 
                   alpha = 0.6, 
                   label.padding=.1, 
                   na.rm=TRUE) +
  theme_bw()


Answer:

Plot two labels, the second with no fill at all. Set the seed to make sure they perfectly overlap. (Using geom_text_repel doesn't seem to work as the repelling works slightly different.)

ggplot(df, aes(x, y)) +
  geom_point(alpha=.3) +
  geom_label_repel(aes(label=label),
                   label.size = NA, 
                   alpha = 0.6, 
                   label.padding=.1, 
                   na.rm=TRUE,
                   seed = 1234) +
  geom_label_repel(aes(label=label),
                   label.size = NA, 
                   alpha = 1, 
                   label.padding=.1, 
                   na.rm=TRUE,
                   fill = NA,
                   seed = 1234) +
  theme_bw()

Question:

I have a rather dense scatterplot that I am constructing with R 'ggplot2' and I want to label a subset of points using 'ggrepel'. My problem is that I want to plot ALL points in the scatterplot, but only label a subset with ggrepel, and when I do this, ggrepel doesn't account for the other points on the plot when calculating where to put the labels, which leads to labels which overlap other points on the plot (which I don't want to label).

Here is an example plot illustrating the issue.

# generate data:
library(data.table)
library(stringi)
set.seed(20180918)
dt = data.table(
  name = stri_rand_strings(3000,length=6),
  one = rnorm(n = 3000,mean = 0,sd = 1),
  two = rnorm(n = 3000,mean = 0,sd = 1))
dt[, diff := one -two]
dt[, diff_cat := ifelse(one > 0 & two>0 & abs(diff)>1, "type_1",
                        ifelse(one<0 & two < 0 & abs(diff)>1, "type_2",
                               ifelse(two>0 & one<0 & abs(diff)>1, "type_3",
                                      ifelse(two<0 & one>0 & abs(diff)>1, "type_4", "other"))))]

# make plot
ggplot(dt, aes(x=one,y=two,color=diff_cat))+
  geom_point()

If I plot only the subset of points I want labelled, then ggrepel is able to place all of the labels in a non-overlapping fashion with respect to other points and labels.

ggplot(dt[abs(diff)>2 & (!diff_cat %in% c("type_3","type_4","other"))], 
  aes(x=one,y=two,color=diff_cat))+
  geom_point()+
  geom_text_repel(data = dt[abs(diff)>2 & (!diff_cat %in% c("type_3","type_4","other"))], 
                  aes(x=one,y=two,label=name))

However when I want to plot this subset of data AND the original data at the same time, I get overlapping points with labels:

# now add labels to a subset of points on the plot
ggplot(dt, aes(x=one,y=two,color=diff_cat))+
  geom_point()+
  geom_text_repel(data = dt[abs(diff)>2 & (!diff_cat %in% c("type_3","type_4","other"))], 
                  aes(x=one,y=two,label=name))

How can I get the labels for the subset of points to not overlap the points from the original data?


Answer:

You can try the following:

  1. Assign a blank label ("") to all the other points from the original data, so that geom_text_repel takes them into consideration when repelling labels from one another;
  2. Increase the box.padding parameter from the default 0.25 to some larger value, for greater distance between labels;
  3. Increase the x and y-axis limits, to give the labels more space at the four sides to repel towards.

Example code (with box.padding = 1):

ggplot(dt, 
       aes(x = one, y = two, color = diff_cat)) +
  geom_point() +
  geom_text_repel(data = . %>% 
                    mutate(label = ifelse(diff_cat %in% c("type_1", "type_2") & abs(diff) > 2,
                                          name, "")),
                  aes(label = label), 
                  box.padding = 1,
                  show.legend = FALSE) + #this removes the 'a' from the legend
  coord_cartesian(xlim = c(-5, 5), ylim = c(-5, 5)) +
  theme_bw()

Here's another attempt, with box.padding = 2:

(Note: I'm using ggrepel 0.8.0. I'm not sure if all the functionalities are present for earlier package versions.)

Question:

This should seem fairly straight forward but I can't find any argument to do this with ggrepel::geom_label_repel().

Sample of data:

df <- structure(list(Athletename = c("Aries Merritt", "Damian Warner"
), Score = c(12.8, 13.44), Event = c("110m hurdles", "110m hurdles"
), Points = c(1135, 1048), Record = c("World Record", "Decathlon Record"
), score_and_points = c("12.8s, 1135pts", "13.44s, 1048pts")), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("Athletename", 
"Score", "Event", "Points", "Record", "score_and_points"))

ggplot2 code:

ggplot(data = data.frame(x = 0), mapping = aes(x = x)) +
  geom_point(data = df, aes(x=Score, y=Points, colour=Record)) +
  geom_label_repel(data = df, 
                   aes(x=Score, y=Points, label = Athletename), 
                   direction = "x",
                   nudge_x = -10) +
  geom_label_repel(data = df, 
                   aes(x=Score, y=Points, label = score_and_points), 
                   direction = "y",
                   nudge_y = -200) +
  scale_y_continuous(name = "Points", 
                     breaks = seq(0,1500,100),
                     limits = c(0,1500)) +
  scale_x_reverse(name = "110m hurdles time (m)",
                     breaks = seq(29,12,-1),
                     limits=c(29,12)) +
  theme(legend.title = element_blank(), legend.position = "top")


Answer:

Hacky but works: add a copy of the geom_label_repel call, but with the addition of segment.alpha = 0. Then all the labels will be on top of all the arrows.

library(ggrepel)
ggplot(data = data.frame(x = 0), mapping = aes(x = x)) +
  geom_point(data = df, aes(x=Score, y=Points, colour=Record)) +
  geom_label_repel(data = df,
                   aes(x=Score, y=Points, label = Athletename),
                   direction = "x",
                   nudge_x = -10) +
  geom_label_repel(data = df, 
                   aes(x=Score, y=Points, label = score_and_points), 
                   direction = "y",
                   nudge_y = -200, ) +
  geom_label_repel(data = df, 
                   aes(x=Score, y=Points, label = score_and_points), 
                   direction = "y", segment.alpha = 0,
                   nudge_y = -200, ) +
  scale_y_continuous(name = "Points", 
                     breaks = seq(0,1500,100),
                     limits = c(0,1500)) +
  scale_x_reverse(name = "110m hurdles time (m)",
                  breaks = seq(29,12,-1),
                  limits=c(29,12)) +
  theme(legend.title = element_blank(), legend.position = "top")

Question:

Is there an elegant way in ggplot2 to make geom_text/geom_label inherit theme specifications like a base_family?

Or asked the other way round: Can I specify a theme that also applies to geom_text/geom_label?


Example:

I want text/labels to look exactly like the axis.text as specified in the theme...

Obviously I could add the specifications manually as optional arguments to geom_text, but I want it to inherit the specifications "automatically"...

library("ggplot2")

ggplot(mtcars, aes(x = mpg,
                   y = hp,
                   label = row.names(mtcars))) +
  geom_point() +
  geom_text() +
  theme_minimal(base_family = "Courier")

Addition: A solution that works with ggrepel::geom_text_repel/geom_label_repel as well would be perfect...


Answer:

You can

Setting Overall font

Firstly, depending on the system you will need to check which fonts are available. As I am running on Windows I am using the following:

install.packages("extrafont")
library(extrafont)
windowsFonts() # check which fonts are available

The theme_set function lets you specify the overall themes for ggplot. So therefore theme_set(theme_minimal(base_family = "Times New Roman")) lets you define the fonts for the plot.

Make Labels Inherit Font

To make the labels inherit this text, there are two things we need to use:

  1. update_geom_defaults lets you update the geometry object styling for future plots in ggplot: http://ggplot2.tidyverse.org/reference/update_defaults.html
  2. theme_get()$text$family extracts the font of the current global ggplot theme.

By combining these two, the label styles can be updated as follows:

# Change the settings
update_geom_defaults("text", list(colour = "grey20", family = theme_get()$text$family))
update_geom_defaults("text_repel", list(colour = "grey20", family = theme_get()$text$family))
Results
theme_set(theme_minimal(base_family = "Times New Roman"))

# Change the settings
update_geom_defaults("text", list(colour = "grey20", family = theme_get()$text$family))

# Basic Plot
ggplot(mtcars, aes(x = mpg,
                   y = hp,
                   label = row.names(mtcars))) +
  geom_point() +
  geom_text()

# works with ggrepel
update_geom_defaults("text_repel", list(colour = "grey20", family = theme_get()$text$family))

library(ggrepel)

ggplot(mtcars, aes(x = mpg,
                   y = hp,
                   label = row.names(mtcars))) +
  geom_point() +
  geom_text_repel()

Question:

I want to shrink the plot area so there is more room for ggrepel labels that currently get cut off. I can't seem to offset the labels any more via nudge_x(), and I do not want to shrink the text size.

I'm trying to find a way to compress the chart so that the groups all move closer to the center, leaving more room for labels at the extremes of the x-axis.

Specifically, I am trying to knit this figure into a portrait PDF. I tried controlling fig.width in the chunk options, but this just made the entire chart smaller.

I want to maximize the width on the portrait page, but shrink the plot area relative to the area for labels.

---
title             : "The title"
shorttitle        : "Title"

author: 
  - name          : "Me"
    affiliation   : "1"
    corresponding : yes    # Define only one corresponding author
    address       : "Address"
    email         : "email"

affiliation:
  - id            : "1"
    institution   : "Company"

authornote: |
  Note here

abstract: |
  Abstract here.


floatsintext      : yes
figurelist        : no
tablelist         : no
footnotelist      : no
linenumbers       : no
mask              : no
draft             : no
note              : "\\clearpage"

documentclass     : "apa6"
classoption       : "man,noextraspace"
header-includes:
  - \usepackage{pdfpages}
  - \usepackage{setspace}
  - \AtBeginEnvironment{tabular}{\singlespacing}
  - \makeatletter\let\expandableinput\@@input\makeatother
  - \interfootnotelinepenalty=10000
  - \usepackage{float} #use the 'float' package
  - \floatplacement{figure}{H} #make every figure with caption = h
  - \raggedbottom
output            : papaja::apa6_pdf
---


```{r test, fig.cap="Caption.", fig.height=8, include=TRUE, echo=FALSE}
library("papaja")
library(tidyverse)
library(ggrepel)

ageGenderF <- structure(list(genAge = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Women, 15-19", 
"Women, 20-24", "Women, 25-35", "Women, 36+"), class = "factor"), 
    word_ = c("this is label 2", "this is label 3", "this is label 4", 
    "this is label 1", "this is label 7", "this is label 5", 
    "this is label 8", "this is label 10", "this is label 11", 
    "this is label 20", "this is label 12", "this is label 6", 
    "this is label 17", "this is label 9", "this is label 15", 
    "this is label 21", "this is label 31", "this is label 25", 
    "this is label 26", "this is label 19", "this is label 24", 
    "this is label 28", "this is label 29", "this is label 30", 
    "this is label 14", "this is label 22", "this is label 18", 
    "this is label 54", "this is label 32", "this is label 44", 
    "this is label 52", "this is label 34", "this is label 59", 
    "this is label 48", "this is label 23", "this is label 47", 
    "this is label 38", "this is label 35", "this is label 61", 
    "this is label 56", "this is label 39", "this is label 72", 
    "this is label 42", "this is label 16", "this is label 66", 
    "this is label 37", "this is label 51", "this is label 27", 
    "this is label 40", "this is label 73", "this is label 60", 
    "this is label 113", "this is label 50", "this is label 45", 
    "this is label 81", "this is label 84", "this is label 53", 
    "this is label 49", "this is label 67", "this is label 68", 
    "this is label 46", "this is label 65", "this is label 41", 
    "this is label 57", "this is label 1", "this is label 2", 
    "this is label 3", "this is label 4", "this is label 5", 
    "this is label 6", "this is label 7", "this is label 8", 
    "this is label 9", "this is label 10", "this is label 11", 
    "this is label 12", "this is label 13", "this is label 14", 
    "this is label 15", "this is label 16", "this is label 17", 
    "this is label 18", "this is label 19", "this is label 20", 
    "this is label 21", "this is label 22", "this is label 23", 
    "this is label 24", "this is label 25", "this is label 26", 
    "this is label 27", "this is label 28", "this is label 29", 
    "this is label 30", "this is label 31", "this is label 32", 
    "this is label 33", "this is label 34", "this is label 35", 
    "this is label 36", "this is label 37", "this is label 38", 
    "this is label 39", "this is label 40", "this is label 41", 
    "this is label 42", "this is label 43", "this is label 44", 
    "this is label 45", "this is label 46", "this is label 47", 
    "this is label 48", "this is label 49", "this is label 50", 
    "this is label 51", "this is label 52", "this is label 53", 
    "this is label 54", "this is label 55", "this is label 56", 
    "this is label 57", "this is label 58", "this is label 59", 
    "this is label 60", "this is label 61", "this is label 62", 
    "this is label 63", "this is label 64", "this is label 1", 
    "this is label 2", "this is label 3", "this is label 6", 
    "this is label 4", "this is label 5", "this is label 12", 
    "this is label 7", "this is label 8", "this is label 9", 
    "this is label 10", "this is label 14", "this is label 11", 
    "this is label 18", "this is label 29", "this is label 45", 
    "this is label 27", "this is label 15", "this is label 26", 
    "this is label 71", "this is label 37", "this is label 13", 
    "this is label 25", "this is label 23", "this is label 22", 
    "this is label 41", "this is label 42", "this is label 55", 
    "this is label 52", "this is label 36", "this is label 34", 
    "this is label 17", "this is label 63", "this is label 24", 
    "this is label 19", "this is label 28", "this is label 38", 
    "this is label 32", "this is label 21", "this is label 30", 
    "this is label 35", "this is label 16", "this is label 64", 
    "this is label 20", "this is label 31", "this is label 53", 
    "this is label 77", "this is label 39", "this is label 70", 
    "this is label 57", "this is label 48", "this is label 43", 
    "this is label 132", "this is label 51", "this is label 66", 
    "this is label 58", "this is label 85", "this is label 120", 
    "this is label 65", "this is label 40", "this is label 121", 
    "this is label 78", "this is label 59", "this is label 141", 
    "this is label 1", "this is label 12", "this is label 6", 
    "this is label 2", "this is label 3", "this is label 5", 
    "this is label 4", "this is label 45", "this is label 52", 
    "this is label 26", "this is label 77", "this is label 8", 
    "this is label 7", "this is label 10", "this is label 14", 
    "this is label 31", "this is label 59", "this is label 178", 
    "this is label 18", "this is label 27", "this is label 42", 
    "this is label 70", "this is label 29", "this is label 37", 
    "this is label 330", "this is label 78", "this is label 25", 
    "this is label 34", "this is label 21", "this is label 450", 
    "this is label 83", "this is label 185", "this is label 57", 
    "this is label 16", "this is label 50", "this is label 126", 
    "this is label 895", "this is label 63", "this is label 402", 
    "this is label 19", "this is label 724", "this is label 40", 
    "this is label 11", "this is label 43", "this is label 758", 
    "this is label 1099", "this is label 73", "this is label 62", 
    "this is label 46", "this is label 183", "this is label 819", 
    "this is label 295", "this is label 1100", "this is label 17", 
    "this is label 282", "this is label 153", "this is label 1101", 
    "this is label 41", "this is label 1102", "this is label 446", 
    "this is label 216", "this is label 13", "this is label 109", 
    "this is label 20"), n = c(774L, 635L, 618L, 495L, 329L, 
    284L, 259L, 217L, 197L, 181L, 163L, 163L, 162L, 160L, 138L, 
    124L, 114L, 112L, 110L, 107L, 99L, 98L, 97L, 92L, 85L, 84L, 
    84L, 78L, 74L, 72L, 68L, 67L, 66L, 66L, 65L, 60L, 60L, 60L, 
    58L, 57L, 55L, 51L, 51L, 51L, 50L, 50L, 48L, 47L, 47L, 46L, 
    46L, 44L, 44L, 44L, 43L, 43L, 43L, 43L, 42L, 41L, 41L, 41L, 
    41L, 41L, 1568L, 1366L, 1220L, 1012L, 687L, 682L, 633L, 516L, 
    464L, 374L, 372L, 326L, 326L, 304L, 293L, 292L, 274L, 261L, 
    259L, 257L, 236L, 232L, 229L, 223L, 223L, 221L, 221L, 213L, 
    210L, 205L, 198L, 191L, 189L, 167L, 165L, 164L, 146L, 142L, 
    140L, 140L, 139L, 136L, 134L, 129L, 122L, 121L, 115L, 115L, 
    115L, 113L, 112L, 110L, 110L, 109L, 107L, 104L, 103L, 102L, 
    99L, 99L, 99L, 97L, 96L, 93L, 426L, 332L, 310L, 290L, 197L, 
    166L, 147L, 134L, 125L, 113L, 105L, 104L, 97L, 83L, 78L, 
    77L, 77L, 74L, 69L, 69L, 69L, 69L, 68L, 61L, 61L, 59L, 59L, 
    58L, 58L, 58L, 57L, 57L, 56L, 54L, 51L, 48L, 47L, 46L, 43L, 
    42L, 38L, 38L, 36L, 34L, 34L, 33L, 32L, 32L, 32L, 32L, 31L, 
    29L, 29L, 28L, 28L, 27L, 27L, 27L, 27L, 27L, 26L, 26L, 25L, 
    24L, 37L, 26L, 26L, 20L, 19L, 18L, 17L, 15L, 14L, 12L, 12L, 
    12L, 12L, 12L, 11L, 10L, 9L, 9L, 9L, 9L, 8L, 7L, 7L, 7L, 
    7L, 7L, 7L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L, 5L, 5L, 
    5L, 5L, 5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L), rank = c(1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
    16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 
    28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 
    40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 
    52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 
    64L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 
    14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 
    26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 
    38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 
    50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 
    62L, 63L, 64L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
    12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 
    24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 
    36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 
    48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 
    60L, 61L, 62L, 63L, 64L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 
    9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 
    21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 
    33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 
    45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 
    57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -256L), groups = structure(list(
    genAge = structure(1:4, .Label = c("Women, 15-19", "Women, 20-24", 
    "Women, 25-35", "Women, 36+"), class = "factor"), .rows = list(
        1:64, 65:128, 129:192, 193:256)), row.names = c(NA, -4L
), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))

ageGenderFLow <- 
  ageGenderF %>%
  filter(genAge=="Women, 15-19") %>%
  filter(rank<=10)

ageGenderFHigh <- 
  ageGenderF %>%
  filter(genAge=="Women, 36+") %>%
  filter(rank<=10)

ageGenderF_ <-
  ageGenderF %>%
  filter(word_ %in% ageGenderFLow$word_ |
         word_ %in% ageGenderFHigh$word_)

# get rank order of words for low set
ageGenderFLowRank <- 
  ageGenderF_ %>%
  filter(genAge=="Women, 15-19") %>%
  arrange(rank) %>%
  mutate(order = 1:n()) 

ageGenderF_ %>%
  mutate(word = factor(word_, ordered=TRUE, levels=ageGenderFLowRank$word_)) %>%
  # https://ibecav.github.io/slopegraph/
  ggplot(., aes(x = genAge, y = reorder(rank, -rank), group = word_)) +
  geom_line(aes(color = word_, alpha = 1), size = 1.5) +
  #geom_line(size = 0.5, color="lightgrey") +
  geom_text_repel(data = . %>% filter(genAge == "Women, 15-19"), 
                  aes(label = word) , 
                  hjust = "left", 
                  #fontface = "bold", 
                  size = 3, 
                  nudge_x = -3, 
                  direction = "y") +
  geom_text_repel(data = . %>% filter(genAge == "Women, 36+"), 
                  aes(label = word) , 
                  hjust = "right", 
                  #fontface = "bold", 
                  size = 3, 
                  nudge_x = 3, 
                  direction = "y") +
  geom_label(aes(label = rank), 
             size = 2.5, 
             label.padding = unit(0.15, "lines"), 
             label.size = 0.0) +
  scale_x_discrete(position = "top") +
  theme_bw() +
  # Remove the legend
  theme(legend.position = "none") +
  # Remove the panel border
  theme(panel.border     = element_blank()) +
  # Remove just about everything from the y axis
  theme(axis.title.y     = element_blank()) +
  theme(axis.text.y      = element_blank()) +
  theme(panel.grid.major.y = element_blank()) +
  theme(panel.grid.minor.y = element_blank()) +
  # Remove a few things from the x axis and increase font size
  theme(axis.title.x     = element_blank()) +
  theme(panel.grid.major.x = element_blank()) +
  theme(axis.text.x.top      = element_text(size=10)) +
  # Remove x & y tick marks
  theme(axis.ticks       = element_blank()) +
  # Format title & subtitle
  theme(plot.title       = element_text(size=10, face = "bold", hjust = 0.5)) +
  theme(plot.subtitle    = element_text(hjust = 0.5))
```




Answer:

If you are willing to change your approach, you could make a big switch and use the text you are using as labels as the axis labels. You could take advantage of the secondary axis to do separate labels for each side of the plot, so things will look a lot like what you are doing now.

The advantage I see to this is that the text fits because it's now part of the axis.

First here's an example using rank as a factor. You have to make the factor into something numeric via as.numeric() in order to get duplicate axes (so far discrete axes don't have secondary axes). Then there's some work to do be done getting the breaks and labels for the axis on each side, so I move data manipulation into a second step (and make rank2 as the reordered factor for ease of doing the breaks later).

Note also the use of expand in scale_x_discrete() to remove space from around the edges of the panel area.

ageGenderF_ = ageGenderF_ %>%
    ungroup() %>%
    mutate(word = factor(word_, ordered = TRUE, levels = ageGenderFLowRank$word_),
           rank2 = reorder(rank, -rank) )

ageGenderF_ %>%
    # https://ibecav.github.io/slopegraph/
    ggplot(., aes(x = genAge, y = as.numeric(rank2), group = word_)) +
    geom_line(aes(color = word_, alpha = 1), size = 1.5) +
    geom_label(aes(label = rank), 
           size = 2.5, 
           label.padding = unit(0.15, "lines"), 
           label.size = 0.0) +
    scale_x_discrete(position = "top", expand = c(0, .05) ) +
    scale_y_continuous(breaks = filter(ageGenderF_, genAge == "Women, 15-19") %>% pull(rank2) %>% as.numeric(), 
                    labels = filter(ageGenderF_, genAge == "Women, 15-19") %>% pull(word),
                    sec.axis = dup_axis(~., 
                                        breaks = filter(ageGenderF_, genAge == "Women, 36+") %>% pull(rank2) %>% as.numeric(), 
                                        labels = filter(ageGenderF_, genAge == "Women, 36+") %>% pull(word) ) ) +
    theme_bw() +
    # Remove the legend
    theme(legend.position = "none",
          # Remove the panel border
          panel.border     = element_blank(),
          # Remove just about everything from the y axis
          axis.title.y     = element_blank(),
          panel.grid.major.y = element_blank(),
          panel.grid.minor.y = element_blank(),
          # Remove a few things from the x axis and increase font size
          axis.title.x     = element_blank(),
          panel.grid.major.x = element_blank(),
          axis.text.x.top      = element_text(size=10),
          # Remove x & y tick marks
          axis.ticks       = element_blank(),
          axis.ticks.length = unit(0, "cm"),
          # Format title & subtitle
          plot.title       = element_text(size=10, face = "bold", hjust = 0.5),
          plot.subtitle    = element_text(hjust = 0.5) )

From a simple r markdown document this looks similar to your example (although not exact):

You can do the exact same thing with rank as numeric, using scale_y_reverse() to reverse the y axis.

ageGenderF_ = ageGenderF_ %>%
    ungroup() %>%
    mutate(word = factor(word_, ordered = TRUE, levels = ageGenderFLowRank$word_))

ageGenderF_ %>%
    # https://ibecav.github.io/slopegraph/
    ggplot(., aes(x = genAge, y = rank, group = word_)) +
    geom_line(aes(color = word_, alpha = 1), size = 1.5) +
    geom_label(aes(label = rank), 
               size = 2.5, 
               label.padding = unit(0.15, "lines"), 
               label.size = 0.0) +
    scale_x_discrete(position = "top", expand = c(0, .05) ) +
    scale_y_reverse(breaks = filter(ageGenderF_, genAge == "Women, 15-19") %>% pull(rank), 
                    labels = filter(ageGenderF_, genAge == "Women, 15-19") %>% pull(word),
                    sec.axis = dup_axis(~., 
                                        breaks = filter(ageGenderF_, genAge == "Women, 36+") %>% pull(rank), 
                                        labels = filter(ageGenderF_, genAge == "Women, 36+") %>% pull(word) ) ) +
    theme_bw() +
    # Remove the legend
    theme(legend.position = "none",
          # Remove the panel border
          panel.border     = element_blank(),
          # Remove just about everything from the y axis
          axis.title.y     = element_blank(),
          panel.grid.major.y = element_blank(),
          panel.grid.minor.y = element_blank(),
          # Remove a few things from the x axis and increase font size
          axis.title.x     = element_blank(),
          panel.grid.major.x = element_blank(),
          axis.text.x.top      = element_text(size=10),
          # Remove x & y tick marks
          axis.ticks       = element_blank(),
          axis.ticks.length = unit(0, "cm"),
          # Format title & subtitle
          plot.title       = element_text(size=10, face = "bold", hjust = 0.5),
          plot.subtitle    = element_text(hjust = 0.5) )

Question:

I am plotting multiple pie plots with ggplot2 and succeeded in having the labels plotted in the right positions, as:

df <- data.frame(annotation=rep(c("promoter", "intergenic", "intragene", "5prime", "3prime"), 3), value=c(69.5, 16, 10.7, 2.5, 1.3, 57.2, 18.8, 20.2, 2.1, 1.7, 50.2, 32.2, 15.3, 1.2, 1.1), treatment=rep(c("treated1", "treated2", "untreated"), c(5, 5, 5)))

library(ggplot2)

ggplot(data = df, aes(x = "", y = value, fill = annotation)) + 
geom_bar(stat = "identity") +
geom_text(aes(label = value), position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") +
facet_grid(.~treatment)

I then wanted to make use of ggrepel so that the small slices numbers do not overlap:

library(ggrepel)

ggplot(data = df, aes(x = "", y = value, fill = annotation)) + 
geom_bar(stat = "identity") +
geom_text_repel(aes(label = value), position = position_stack(vjust = 0.5)) +  
coord_polar(theta = "y") +
facet_grid(.~treatment)

But I get the following warning "Warning: Ignoring unknown parameters: position"

and messed up labels.

Anyone knows how to combine the right positioning of the labels with geom_text_repel, or any alternative?

Thank you!

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Scientific Linux 7.2 (Nitrogen)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggrepel_0.6.5 ggplot2_2.2.1 limma_3.26.9 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8      digest_0.6.12    grid_3.3.2       plyr_1.8.4      
 [5] gtable_0.2.0     magrittr_1.5     scales_0.4.1     stringi_1.1.5   
 [9] reshape2_1.4.2   lazyeval_0.2.0   labeling_0.3     tools_3.3.2     
 [13] stringr_1.2.0    munsell_0.4.3    colorspace_1.3-2 tibble_1.3.0    

Answer:

First, you need to make a correct column with position of your labels before coord_polar(). I would do it with 'dplyr' but you can use whatever you comfortable with:

library('dplyr')

df <- df %>% 
  arrange(treatment, desc(annotation)) %>% 
  group_by(treatment) %>% 
  mutate(text_y = cumsum(value) - value/2)

df
# A tibble: 15 x 4
# Groups:   treatment [3]
   annotation value treatment text_y
        <chr> <dbl>     <chr>  <dbl>
 1   promoter  69.5  treated1  34.75
 2  intragene  10.7  treated1  74.85
 3 intergenic  16.0  treated1  88.20
 4     5prime   2.5  treated1  97.45
 5     3prime   1.3  treated1  99.35
 6   promoter  57.2  treated2  28.60
 7  intragene  20.2  treated2  67.30
 8 intergenic  18.8  treated2  86.80
 9     5prime   2.1  treated2  97.25
10     3prime   1.7  treated2  99.15
11   promoter  50.2 untreated  25.10
12  intragene  15.3 untreated  57.85
13 intergenic  32.2 untreated  81.60
14     5prime   1.2 untreated  98.30
15     3prime   1.1 untreated  99.45

Now text and labels will be in the middle of a column when we will use text_y as a y aesthetic for geom_text().

Plot prior to coord_polar():

ggplot(data = df, aes(x = "", y = value, fill = annotation)) + 
  geom_bar(stat = "identity") +
  geom_label(aes(label = value, y = text_y)) +
  facet_grid(.~treatment)

And now adding label_repel and transforming coordinates:

ggplot(data = df, aes(x = "", y = value, fill = annotation)) +
  geom_bar(stat = "identity") +
  geom_label_repel(aes(label = value, y = text_y)) +
  facet_grid(. ~ treatment) +
  coord_polar(theta = "y")

Question:

I'm working on a chart similar to a slopegraph, where I'd like to put labels along one or both sides with ample blank space to fit them on both sides. In cases where labels are very long, I've wrapped them using stringr::str_wrap to place linebreaks. To keep labels from overlapping, I'm using ggrepel::geom_text_repel with direction = "y" so the x-positions are stable but the y-positions are repelled away from one another. I've also got hjust = "outward" to align the left-side text at its right end and vice versa.

However, it seems that the repel positioning places the label's bounding box with an hjust = "outward", but the text within that label has hjust = 0.5, i.e. text is centered within its bounds. Until now, I'd never noticed this, but with wrapped labels, the second line is awkwardly centered, whereas I'd expect to see both lines left-aligned or right-aligned.

Here's an example built off the mpg dataset.

library(ggplot2)
library(dplyr)
library(ggrepel)

df <- structure(list(long_lbl = c("chevrolet, k1500 tahoe 4wd, auto(l4)", 
                                  "chevrolet, k1500 tahoe 4wd, auto(l4)", "subaru, forester awd, manual(m5)", 
                                  "subaru, forester awd, manual(m5)", "toyota, camry, manual(m5)", 
                                  "toyota, camry, manual(m5)", "toyota, toyota tacoma 4wd, manual(m5)", 
                                  "toyota, toyota tacoma 4wd, manual(m5)", "volkswagen, jetta, manual(m5)", 
                                  "volkswagen, jetta, manual(m5)"), year = c(1999L, 2008L, 1999L, 
                                                                             2008L, 1999L, 2008L, 1999L, 2008L, 1999L, 2008L), mean_cty = c(11, 
                                                                                                                                            14, 18, 20, 21, 21, 15, 17, 33, 21)), class = c("tbl_df", "tbl", 
                                                                                                                                                                                            "data.frame"), row.names = c(NA, -10L))

df_wrap <- df %>%  
  mutate(wrap_lbl = stringr::str_wrap(long_lbl, width = 25))

ggplot(df_wrap, aes(x = year, y = mean_cty, group = long_lbl)) +
  geom_line() +
  geom_text_repel(aes(label = wrap_lbl),
                  direction = "y", hjust = "outward", seed = 57, min.segment.length = 100) +
  scale_x_continuous(expand = expand_scale(add = 10))

The same thing happens with other values of hjust. Looking at the function's source, I see a line that points to this issue:

hjust = x$data$hjust %||% 0.5,

where %||% assigns 0.5 if x$data$hjust is null. That's as far as I understand, but it seems that the hjust I've set isn't being carried over to this positioning and is instead coming up null.

Have I missed something? Can anyone see where I might override this without reimplementing the whole algorithm? Or is there a bug here that drops my hjust?


Answer:

TL;DR: probably a bug

Long answer:

I think it might be a bug in the code. I checked the gtable of the plot you made, wherein the hjust was specified numerically and correctly:

# Assume 'g' is the plot saved under the variable 'g'
gt <- ggplotGrob(g)
# Your number at the end of the geom may vary
textgrob <- gt$grobs[[6]]$children$geom_text_repel.textrepeltree.1578
head(textgrob$data$hjust)
[1] 1 0 1 0 1 0

Which got me thinking that (1) the plot can't be fixed by messing around in the gtable and (2) the drawtime code for the textrepeltree class of grobs may contain some errors. This makes sense, since the labels are repositioned when the plot device is resized. So when we look at the makeContent.textrepeltree() code in the link you provided, we can see that the hjust parameter is passed on to makeTextRepelGrobs(). Let's have a look at the relevant formals:

makeTextRepelGrobs <- function(
  ...other_arguments...,
  just = "center",
  ...other_arguments...,
  hjust = 0.5,
  vjust = 0.5
) { ...body...}

We can see that hjust is a valid argument, but there also exists a just argument, which is an argument that is not passed on from makeContent.textrepeltree().

When we look at the function body there are these two lines:

  hj <- resolveHJust(just, NULL)
  vj <- resolveVJust(just, NULL)

Where resolveH/VJust are imported from the grid package. The resolveHJust() essentially checks whether the second argument is NULL and if that is true, default to the first argument, otherwise return the second argument. You can see that the hjust that was passed on to makeTextRepelGrobs() does not get passed to resolveHJust(), and this seems to be the point where your hjust parameter is dropped unexpectedly.

Further down the code is where the actual text grobs are made:

  t <- textGrob(
    ...other_arguments...
    just = c(hj, vj),
    ...other_arguments...
  )

I imagine that the fix would be relatively straightforward: you would just have to supply hjust as the second argument to resolveHJust(). However, since that makeTextRepelGrobs() is internal to ggrepel and does not get exported, you would have to copy a lot of extra code to get this to work. (Not sure if only copying the makeTextRepelGrob() would be sufficient, haven't tested this)

All of this leaves me to conclude that the hjust that you specified in geom_text_repel() gets lost at the last moment of drawtime by the makeTextRepelGrobs() internal function.

Question:

I have XY data (a 2D tSNE embedding of high dimensional data) which I'd like to scatter plot. The data are assigned to several clusters, so I'd like to color code the points by cluster and then add a single label for each cluster, that has the same color coding as the clusters, and is located outside (as much as possible) from the cluster's points.

Any idea how to do this using R in either ggplot2 and ggrepel or plotly?

Here's the example data (the XY coordinates and cluster assignments are in df and the labels in label.df) and the ggplot2 part of it:

library(dplyr)
library(ggplot2)
set.seed(1)
df <- do.call(rbind,lapply(seq(1,20,4),function(i) data.frame(x=rnorm(50,mean=i,sd=1),y=rnorm(50,mean=i,sd=1),cluster=i)))
df$cluster <- factor(df$cluster)

label.df <- data.frame(cluster=levels(df$cluster),label=paste0("cluster: ",levels(df$cluster)))

ggplot(df,aes(x=x,y=y,color=cluster))+geom_point()+theme_minimal()+theme(legend.position="none")


Answer:

The geom_label_repel() function in the ggrepel package allows you to easily add labels to plots while trying to "repel" the labels from not overlapping with other elements. A slight addition to your existing code where we summarize the data / get coordinates of where to put the labels (here I chose the upper left'ish region of each cluster - which is the min of x and the max of y) and merge it with your existing data containing the cluster labels. Specify this data frame in the call to geom_label_repel() and specify the variable that contains the label aesthetic in aes().

library(dplyr)
library(ggplot2)
library(ggrepel)

set.seed(1)
df <- do.call(rbind,lapply(seq(1,20,4),function(i) data.frame(x=rnorm(50,mean=i,sd=1),y=rnorm(50,mean=i,sd=1),cluster=i)))
df$cluster <- factor(df$cluster)

label.df <- data.frame(cluster=levels(df$cluster),label=paste0("cluster: ",levels(df$cluster)))
label.df_2 <- df %>% 
  group_by(cluster) %>% 
  summarize(x = min(x), y = max(y)) %>% 
  left_join(label.df)

ggplot(df,aes(x=x,y=y,color=cluster))+geom_point()+theme_minimal()+theme(legend.position="none") +
  ggrepel::geom_label_repel(data = label.df_2, aes(label = label))

Question:

I am trying to make a geom_point where the text labels both repel, and point to their associated points even if I am using position=dodge or position=jitter. I also have a lot of points to label, which is why I want to use ggrepel or something similar. My understanding is that I cannot use the position argument for ggrepel.

Is there any way I can get a plot like this, except with the segments pointing to their associated points?

require(ggplot2)
require(ggrepel)
data("mtcars")
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$am <- as.factor(mtcars$am)

require(ggplot2)
require(ggrepel)
dodge = position_dodge(1)
ggplot(mtcars, aes(x = am, y=mpg)) +
  geom_point(size=3, position=dodge, alpha=0.5, aes(color=cyl)) +
  geom_text_repel(data = mtcars,
                  aes(label = mpg, x=am, y=mpg),  alpha=0.9, size=4,
                  segment.size = .25, segment.alpha = .8, force = 1)


Answer:

Today, I updated ggrepel to support the position option in version 0.7.3.

Please try it out and let me know how it goes.

If you have issues, please report them here: https://github.com/slowkow/ggrepel/issues

Install version 0.7.3 of ggrepel:

devtools::install_github("slowkow/ggrepel")

Let's try it:

require(ggplot2)
require(ggrepel)
data("mtcars")
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$am <- as.factor(mtcars$am)

dodge <- position_dodge(1)
ggplot(mtcars, aes(x = am, y = mpg, label = mpg)) +
  geom_point(
    mapping = aes(color = cyl),
    position = dodge,
    size = 3,
    alpha = 0.5
  ) +
  geom_text_repel(
    mapping = aes(group = cyl),
    position = dodge,
    size = 4,
    alpha = 0.9,
    segment.size = .25,
    segment.alpha = .8,
    force = 1
  )

Question:

I tried the following code but instead of showing the label in middle, I want to point it to leftmost (you can see my picture under). Thanks for the help!

library(tidyverse)
library(ggrepel)

mtcars %>% 
  group_by(am, cyl) %>% 
  slice(1) %>% 
  ggplot(aes(x = am, y = mpg, group = cyl, fill = cyl, label = mpg)) + 
  geom_bar(position = "dodge", stat = "identity") +
  geom_label_repel(data = mtcars %>% filter(am == 0, 
                                            cyl == 4) %>% 
                     slice(1),
                   nudge_x = 0.2,
                   nudge_y = 0.3,
                   aes(fill = NULL))

Created on 2019-01-22 by the reprex package (v0.2.1)


Answer:

This solution will work generally, for any combination of labels.

First, it seems as if the am variable is a binary factor, so I code it as a factor so the x axis is a little cleaner. In your case, you are looking for position_dodge(width = 1) for your labels, which automatically lines the labels up on top of the corresponding bars. The way I thought was best for subsetting to only the one label you want us ti define a column mpg_lab which is NA for all the labels you don't want. If you change the conditions in the last mutate row you can isolate different labels, or delete that row altogether for all the labels.

df <- mtcars %>% 
  mutate(am = factor(am)) %>% 
  group_by(am, cyl) %>% 
  slice(1) %>% 
  mutate(mpg_lab = ifelse(am == 0 & cyl == 4, mpg, NA))

df %>% 
  ggplot(aes(x = am, y = mpg, group = cyl, fill = cyl)) + 
  geom_bar(position = "dodge", stat = "identity") + 
  geom_label_repel(data = df, 
             aes(label = mpg_lab, fill = NULL), position = position_dodge(width = 1), point.padding = NA, ylim = max(df$mpg_lab, na.rm = T) * 1.02)

A couple of optional things I added was turning off point.padding in geom_label_repel to keep the labels from randomly moving around side-to-side. I also bumped the label up so you can see the arrow by using the ylim argument in there. You can play around with those options if you want something different.