Hot questions for Using Ggplot2 in ggdendro

Question:

When I use the great package ggdendro to plot my tree, I come across a problem: One day, all labels of the tree it produces suddenly all disappeared.

When I run the following code in my local machine and in server, I get different results. No labels in local machine and labels do exist in the server version.

fit = ClustOfVar::hclustvar(X.quanti = mtcars)

ggdendro::ggdendrogram(as.dendrogram(fit),rotate = TRUE)

Server Version(OK):

Local Windows 7 Version(No labels):

I printed the session info as well for references.

dput compare(the same) :


Answer:

This issue has been fixed in the latest development version of ggdendro, version 0.1.19

The underlying issue is described at issue #24. This bug was exposed in a change of behaviour in ggplot2, and causes a problem with the scales package version 0.4. To be clear - the bug was in ggdendro, not ggplot2 or scales, but never surfaced when using earlier versions of scales.

This version is not yet on CRAN, so use devtools to get the latest version:

devtools::install_github("andrie/ggdendro")

Update. Version 0.1-20 of ggdendro is now available on CRAN.

Question:

I am trying to create a dendrogram using the package dendextend. It creates really nice gg dendrograms but unfortunately when you turn it into a "circle", the labels do not keep up. I'll provide an example below.

My distance object is here: http://speedy.sh/JRVBS/mydist.RDS

library(dendextend)
library(ggplot2)
#library(devtools) ; install_github('kassambara/factoextra')
library(factoextra)


clus <- hcut(mydist, k = 6, hc_func = 'hclust', 
             hc_method = 'ward.D2', graph = FALSE, isdiss = TRUE)

dend <- as.dendrogram(clus)
labels(dend) <- paste0(paste0(rep(' ', 3), collapse = ''), labels(dend))
dend <- sort(dend, decreasing = FALSE)

ggd1 <- ggplot(dend %>%
                   set('branches_k_color', k = 6) %>%
                   set('branches_lwd', 0.6) %>%
                   set('labels_colors', k = 6) %>%
                   set('labels_cex', 0.6), 
               theme = theme_minimal(),
               horiz = TRUE)
ggd1 <- ggd1 + theme(panel.grid.major = element_blank(),
                     axis.text = element_blank(),
                     axis.title = element_blank())
ggd1 <- ggd1 + ylim(max(get_branches_heights(dend)), -3)

This basically gives me this image: Which is great. However, I want to turn this into a circle, and so use:

ggd1 + coord_polar(theta = 'x') 

And I get this graph below. This is close to exactly what I want, but I just need to rotate the labels.

Any help is appreciated. I know that under the hood dendextend is basically creating a few data.frames and then calling geom_segment() and geom_text() on them to create the dendrogram and labels. I believe I can expose the associated data.frame as follows:

back.df1 <- dendextend::as.ggdend(dend)
back.df2 <- dendextend::prepare.ggdend(back.df1)

Another tactic would possibly to be to use ggplot(labels = FALSE...) when plotting, and then to add geom_text() manually in some way that preserves the coloring but allows me to use geom_text(angle = ).

I also suspect some combination of various ggplot wizardry would allow me to take back.df2 and create the 1st and second plots again, but also control the angle of the labels. However, I do not know how to do any of this, and have built out a lot already using the dendextend package and would ideally like to avoid having to use any new package for creating dendrogram objects because I really like this outside of the labels!


SOLUTION

I based this off the solution from Richard Telford below. I first created an edited version of the ggplot.ggdend(). This is identical to the one provided in the answer below. I next created a function to automatically create the angle and hjust vectors so that the labels rotation switches from 6 o'clock to 12 o'clock to improve readability.

createAngleHJustCols <- function(labeldf) {        
    nn <- length(labeldf$y)
    halfn <- floor(nn/2)
    firsthalf <- rev(90 + seq(0,360, length.out = nn))
    secondhalf <- rev(-90 + seq(0,360, length.out = nn))
    angle <- numeric(nn)
    angle[1:halfn] <- firsthalf[1:halfn]
    angle[(halfn+1):nn] <- secondhalf[(halfn+1):nn]

    hjust <- numeric(nn)
    hjust[1:halfn] <- 0
    hjust[(halfn+1):nn] <- 1

    return(list(angle = angle, hjust = hjust))
}

I then produced the plot using the following code:

gdend <- dendextend::as.ggdend(dend %>%
                                   set('branches_k_color', k = 6) %>%
                                   set('branches_lwd', 0.6) %>%
                                   set('labels_colors', k = 6) %>%
                                   set('labels_cex', 0.6))

gdend$labels$angle <- ifelse(horiz, 0, 90)
gdend$labels$hjust <- 0
gdend$labels$vjust <- 0.5

# if polar, change the angle and hjust so that the labels rotate
if(polarplot) {
    newvalues <- createAngleHJustCols(gdend$labels)
    gdend$labels$angle <- newvalues[['angle']]
    gdend$labels$hjust <- newvalues[['hjust']]
}

ggresult <- newggplot.ggdend(gdend, horiz = TRUE, offset_labels = -2) 
ggresult <- ggresult + ggtitle(plottitle)
ggresult <- ggresult + theme(plot.margin = margin(c(2,2,2,2),
                             axis.text = element_blank(),
                             plot.title = element_text(margin = margin(10,2,2,2)))
ggresult <- ggresult + ylim(max(get_branches_heights(dend)), -5)
ggresult <- ggresult + coord_polar(theta = 'x', direction = 1)

And that ultimately produced this final plot!

(I changed a couple things in the data so some of the order may appear different in the plot)


Answer:

This is possible, but you need to edit dendextend:::ggplot.ggdend first to make it accept the angle aesthetic (and also hjust and vjust)

Step 1: edit dendextend:::ggplot.ggdend

newggplot.ggdend <- function (data, segments = TRUE, labels = TRUE, nodes = TRUE, 
          horiz = FALSE, theme = theme_dendro(), offset_labels = 0, ...) {
  data <- prepare.ggdend(data)
  #angle <- ifelse(horiz, 0, 90)
  #hjust <- ifelse(horiz, 0, 1)
  p <- ggplot()
  if (segments) {
    p <- p + geom_segment(data = data$segments, aes_string(x = "x", y = "y", xend = "xend", yend = "yend", colour = "col", linetype = "lty", size = "lwd"), lineend = "square") + 
      guides(linetype = FALSE, col = FALSE) + scale_colour_identity() + 
      scale_size_identity() + scale_linetype_identity()
  }
  if (nodes) {
    p <- p + geom_point(data = data$nodes, aes_string(x = "x", y = "y", colour = "col", shape = "pch", size = "cex")) + 
      guides(shape = FALSE, col = FALSE, size = FALSE) + 
      scale_shape_identity()
  }
  if (labels) {
    data$labels$cex <- 5 * data$labels$cex
    data$labels$y <- data$labels$y + offset_labels
    p <- p + geom_text(data = data$labels, aes_string(x = "x", y = "y", label = "label", colour = "col", size = "cex", angle = "angle", hjust = "hjust", vjust = "vjust"))#edited
  }
  if (horiz) {
    p <- p + coord_flip() + scale_y_reverse(expand = c(0.2, 0))
  }
  if (!is.null(theme)) {
    p <- p + theme
  }
  p
}

assignInNamespace(x = "ggplot.ggdend", ns = "dendextend", value = newggplot.ggdend)

Step 2: Make the data object

gdend <- dendextend::as.ggdend(dend %>%
                        set('branches_k_color', k = 6) %>%
                        set('branches_lwd', 0.6) %>%
                        set('labels_colors', k = 6) %>%
                        set('labels_cex', 0.6),
                      theme = theme_minimal(),
                      horiz = TRUE)
gdend$labels$angle <- seq(90, -270, length = nrow(gdend$labels))
gdend$labels$vjust <- cos(gdend$labels$angle * pi) / (180)
gdend$labels$hjust <- sin(gdend$labels$angle * pi) / (180)

Step 3: plot

ggd1 <- ggplot(gdend)
ggd1 <- ggd1 + theme(panel.grid.major = element_blank(),
                     axis.text = element_blank(),
                     axis.title = element_blank())
ggd1 <- ggd1 + ylim(max(get_branches_heights(dend)), -3)
ggd1
ggd1 + coord_polar(theta = 'x') 

Question:

I want to create a beautiful dendrogram by using ggplot2.

This is a reproducible example of what I'm doing:

library(ggplot2)
library(ggdendro)
data(mtcars)
x <- as.matrix(scale(mtcars))
dd.row <- as.dendrogram(hclust(dist(t(x))))

mtcars_dendrogram <- ggdendrogram(dd.row, rotate = TRUE, theme_dendro = FALSE) +
  labs(x="", y="Distance") +
  ggtitle("Mtcars Dendrogram") + 
  theme(panel.border = element_rect(colour = "black", fill=NA, size=.5), 
        axis.text.x=element_text(colour="black", size = 10), 
        axis.text.y=element_text(colour="black", size = 10),
        legend.key=element_rect(fill="white", colour="white"),
        legend.position="bottom", legend.direction="horizontal", 
        legend.title = element_blank(),
        panel.grid.major = element_line(colour = "#d3d3d3"), 
        panel.grid.minor = element_blank(), 
        panel.border = element_blank(), 
        panel.background = element_blank(),
        plot.title = element_text(size = 14, family = "Tahoma", face = "bold"), 
        text=element_text(family="Tahoma"))
mtcars_dendrogram <- mtcars_dendrogram +
  annotate("rect", xmin = 0.6, xmax = 5.4, ymin = 0, ymax = 6.4, fill="red", colour="red", alpha=0.1) +
  annotate("rect", xmin = 5.6, xmax = 7.4, ymin = 0, ymax = 6.4, fill="blue", colour="blue", alpha=0.1) +
  annotate("rect", xmin = 7.6, xmax = 11.4, ymin = 0, ymax = 6.4, fill="orange", colour="orange", alpha=0.1) +
  geom_hline(yintercept = 6.4, color = "blue", size=1, linetype = "dotted")
mtcars_dendrogram

This is the result

I want to extend the rectangles so that it covers the x-axis. If I change, for example,

annotate("rect", xmin = 5.6, xmax = 7.4, ymin = 0, ymax = 6.4, fill="blue", colour="blue", alpha=0.1)

to

annotate("rect", xmin = 5.6, xmax = 7.4, ymin = -1, ymax = 6.4, fill="blue", colour="blue", alpha=0.1)

Then I get this

This is what I want to obtain (this result was altered with Photoshop)

Any help is highly welcome. Thanks a lot beforehand.


Answer:

You can fake the left axis:

mtcars_dendrogram <- mtcars_dendrogram +
annotate("rect", xmin = 0.6, xmax = 5.4,  ymin = -1, ymax = 6.4, fill="red", colour="red", alpha=0.1) +
  annotate("rect", xmin = 5.6, xmax = 7.4,  ymin = -1, ymax = 6.4, fill="blue", colour="blue", alpha=0.1) +
  annotate("rect", xmin = 7.6, xmax = 11.4, ymin = -1, ymax = 6.4, fill="orange", colour="orange", alpha=0.1) +
  geom_hline(yintercept = 6.4, color = "blue", size=1, linetype = "dotted") +
  theme(axis.text.y = element_blank(),
        axis.line.y = element_blank(),
        axis.ticks.y = element_blank()) + 
  geom_text(aes(y = 0, x = 1:11, 
                label = c("carb", "wt", "hp", "cyl", "disp", "qsec", "vs", "mpg", "drat", "am", "gear")),
            hjust = "right",
            nudge_y = -.1))

Question:

The idea is to combine R packages ClustOfVar and ggdendro to give a visual summary of variable clustering.

When there are few columns in the data, the result is pretty good except that there are areas not covered(as circled in the chart below). Using mtcars for example:

library(plyr)
library(ggplot2)
library(gtable)
library(grid)
library(gridExtra)

library(ClustOfVar)
library(ggdendro)

fit = hclustvar(X.quanti = mtcars)

labels = cutree(fit,k = 5)

labelx = data.frame(Names=names(labels),group = paste("Group",as.vector(labels)),num=as.vector(labels))

p1 = ggdendrogram(as.dendrogram(fit), rotate=TRUE)

df2<-data.frame(cluster=cutree(fit, k =5), states=factor(fit$labels,levels=fit$labels[fit$order]))
df3<-ddply(df2,.(cluster),summarise,pos=mean(as.numeric(states)))

p2 = ggplot(df2,aes(states,y=1,fill=factor(cluster)))+geom_tile()+
  scale_y_continuous(expand=c(0,0))+
  theme(axis.title=element_blank(),
        axis.ticks=element_blank(),
        axis.text=element_blank(),
        legend.position="none")+coord_flip()+
  geom_text(data=df3,aes(x=pos,label=cluster))
gp1<-ggplotGrob(p1)
gp2<-ggplotGrob(p2)  
maxHeight = grid::unit.pmax(gp1$heights[2:5], gp2$heights[2:5])
gp1$heights[2:5] <- as.list(maxHeight)
gp2$heights[2:5] <- as.list(maxHeight)
grid.arrange(gp2, gp1, ncol=2,widths=c(1/6,5/6))

When there are a large number of columns, another issue occurs. That is, the height of the color tiles part does not match the height the dendrogram.

library(ClustOfVar)
library(ggdendro)
X = data.frame(mtcars,mtcars,mtcars,mtcars,mtcars,mtcars)

fit = hclustvar(X.quanti = X)

labels = cutree(fit,k = 5)

labelx = data.frame(Names=names(labels),group = paste("Group",as.vector(labels)),num=as.vector(labels))

p1 = ggdendrogram(as.dendrogram(fit), rotate=TRUE)

df2<-data.frame(cluster=cutree(fit, k =5), states=factor(fit$labels,levels=fit$labels[fit$order]))
df3<-ddply(df2,.(cluster),summarise,pos=mean(as.numeric(states)))

p2 = ggplot(df2,aes(states,y=1,fill=factor(cluster)))+geom_tile()+
  scale_y_continuous(expand=c(0,0))+
  theme(axis.title=element_blank(),
        axis.ticks=element_blank(),
        axis.text=element_blank(),
        legend.position="none")+coord_flip()+
  geom_text(data=df3,aes(x=pos,label=cluster))
gp1<-ggplotGrob(p1)
gp2<-ggplotGrob(p2)  
maxHeight = grid::unit.pmax(gp1$heights[2:5], gp2$heights[2:5])
gp1$heights[2:5] <- as.list(maxHeight)
gp2$heights[2:5] <- as.list(maxHeight)
grid.arrange(gp2, gp1, ncol=2,widths=c(1/6,5/6))

@Sandy Muspratt has actually provided an excellent solution to this IF we have the R upgraded to version 3.3.1. R: ggplot slight adjustment for clustering summary

But since I cannot change the version of the R deployed in the corporate server, I wonder if there is any other workaround that can align these two parts.


Answer:

As far as I can tell, your code is not far wrong. The problem is that you are trying to match a continuous scale to a discrete scale when you merge the two plots. Also, it appears that ggdendrogram() adds additional space to the y-axis.

library(plyr)
library(ggplot2)
library(gtable)
library(grid)
library(gridExtra)

library(ClustOfVar)
library(ggdendro)

# Data
X = data.frame(mtcars,mtcars,mtcars,mtcars,mtcars,mtcars)

# Cluster analysis
fit = hclustvar(X.quanti = X)

# Labels data frames
df2 <- data.frame(cluster = cutree(fit, k =5), 
     states = factor(fit$labels, levels = fit$labels[fit$order]))
df3 <- ddply(df2, .(cluster), summarise, pos = mean(as.numeric(states)))

# Dendrogram
# scale_x_continuous() for p1 should match scale_x_discrete() from p2
# scale_x_continuous strips off the labels. I grab them from df2
# scale _y_continuous() puts a little space between the labels and the dendrogram
p1 <- ggdendrogram(as.dendrogram(fit), rotate = TRUE) +
     scale_x_continuous(expand = c(0, 0.5), labels = levels(df2$states), breaks = 1:length(df2$states)) +
     scale_y_continuous(expand = c(0.02, 0)) 

# Tiles and labels
p2 <- ggplot(df2,aes(states, y = 1, fill = factor(cluster))) +
  geom_tile() +
  scale_y_continuous(expand = c(0, 0)) + 
  scale_x_discrete(expand = c(0, 0)) +
  geom_text(data = df3, aes(x = pos, label = cluster)) +
  coord_flip() +
  theme(axis.title = element_blank(),
        axis.ticks = element_blank(),
        axis.text = element_blank(),
        legend.position = "none")

# Get the ggplot grobs
gp1 <- ggplotGrob(p1)
gp2 <- ggplotGrob(p2)  

# Make sure the heights match
maxHeight <- unit.pmax(gp1$heights, gp2$heights)
gp1$heights <- as.list(maxHeight)
gp2$heights <- as.list(maxHeight)

# Combine the two plots
grid.arrange(gp2, gp1, ncol = 2,widths = c(1/6, 5/6))

Question:

Here is my data:

ddata1 <- structure(list(segments = structure(list(x = c(8.203125, 1.5, 
1.5, 1, 1.5, 2, 8.203125, 14.90625, 14.90625, 10.0625, 10.0625, 
5.6875, 5.6875, 3.875, 3.875, 3, 3.875, 4.75, 4.75, 4, 4.75, 
5.5, 5.5, 5, 5.5, 6, 5.6875, 7.5, 7.5, 7, 7.5, 8, 10.0625, 14.4375, 
14.4375, 12.125, 12.125, 10.5, 10.5, 9.5, 9.5, 9, 9.5, 10, 10.5, 
11.5, 11.5, 11, 11.5, 12, 12.125, 13.75, 13.75, 13, 13.75, 14.5, 
14.5, 14, 14.5, 15, 14.4375, 16.75, 16.75, 16, 16.75, 17.5, 17.5, 
17, 17.5, 18, 14.90625, 19.75, 19.75, 19, 19.75, 20.5, 20.5, 
20, 20.5, 21), y = c(0.597091229013013, 0.597091229013013, 0.353069357803605, 
0.353069357803605, 0.353069357803605, 0.353069357803605, 0.597091229013013, 
0.597091229013013, 0.448435999122362, 0.448435999122362, 0.390288662068433, 
0.390288662068433, 0.277787356115265, 0.277787356115265, 0.209941905126808, 
0.209941905126808, 0.209941905126808, 0.209941905126808, 0.179837725036859, 
0.179837725036859, 0.179837725036859, 0.179837725036859, 0.136782743294966, 
0.136782743294966, 0.136782743294966, 0.136782743294966, 0.277787356115265, 
0.277787356115265, 0.227863143853408, 0.227863143853408, 0.227863143853408, 
0.227863143853408, 0.390288662068433, 0.390288662068433, 0.356332108523753, 
0.356332108523753, 0.307670014691839, 0.307670014691839, 0.255894447541048, 
0.255894447541048, 0.145256016771056, 0.145256016771056, 0.145256016771056, 
0.145256016771056, 0.255894447541048, 0.255894447541048, 0.221845947877971, 
0.221845947877971, 0.221845947877971, 0.221845947877971, 0.307670014691839, 
0.307670014691839, 0.29024123584904, 0.29024123584904, 0.29024123584904, 
0.29024123584904, 0.255131135079098, 0.255131135079098, 0.255131135079098, 
0.255131135079098, 0.356332108523753, 0.356332108523753, 0.337359353946151, 
0.337359353946151, 0.337359353946151, 0.337359353946151, 0.202624960168806, 
0.202624960168806, 0.202624960168806, 0.202624960168806, 0.448435999122362, 
0.448435999122362, 0.438580594379611, 0.438580594379611, 0.438580594379611, 
0.438580594379611, 0.359137362193916, 0.359137362193916, 0.359137362193916, 
0.359137362193916), xend = c(1.5, 1.5, 1, 1, 2, 2, 14.90625, 
14.90625, 10.0625, 10.0625, 5.6875, 5.6875, 3.875, 3.875, 3, 
3, 4.75, 4.75, 4, 4, 5.5, 5.5, 5, 5, 6, 6, 7.5, 7.5, 7, 7, 8, 
8, 14.4375, 14.4375, 12.125, 12.125, 10.5, 10.5, 9.5, 9.5, 9, 
9, 10, 10, 11.5, 11.5, 11, 11, 12, 12, 13.75, 13.75, 13, 13, 
14.5, 14.5, 14, 14, 15, 15, 16.75, 16.75, 16, 16, 17.5, 17.5, 
17, 17, 18, 18, 19.75, 19.75, 19, 19, 20.5, 20.5, 20, 20, 21, 
21), yend = c(0.597091229013013, 0.353069357803605, 0.353069357803605, 
0, 0.353069357803605, 0, 0.597091229013013, 0.448435999122362, 
0.448435999122362, 0.390288662068433, 0.390288662068433, 0.277787356115265, 
0.277787356115265, 0.209941905126808, 0.209941905126808, 0, 0.209941905126808, 
0.179837725036859, 0.179837725036859, 0, 0.179837725036859, 0.136782743294966, 
0.136782743294966, 0, 0.136782743294966, 0, 0.277787356115265, 
0.227863143853408, 0.227863143853408, 0, 0.227863143853408, 0, 
0.390288662068433, 0.356332108523753, 0.356332108523753, 0.307670014691839, 
0.307670014691839, 0.255894447541048, 0.255894447541048, 0.145256016771056, 
0.145256016771056, 0, 0.145256016771056, 0, 0.255894447541048, 
0.221845947877971, 0.221845947877971, 0, 0.221845947877971, 0, 
0.307670014691839, 0.29024123584904, 0.29024123584904, 0, 0.29024123584904, 
0.255131135079098, 0.255131135079098, 0, 0.255131135079098, 0, 
0.356332108523753, 0.337359353946151, 0.337359353946151, 0, 0.337359353946151, 
0.202624960168806, 0.202624960168806, 0, 0.202624960168806, 0, 
0.448435999122362, 0.438580594379611, 0.438580594379611, 0, 0.438580594379611, 
0.359137362193916, 0.359137362193916, 0, 0.359137362193916, 0
)), .Names = c("x", "y", "xend", "yend"), row.names = c(NA, 80L
), class = "data.frame"), labels = structure(list(x = c(1, 2, 
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21), y = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0), label = structure(1:21, .Label = c("R72", "R73", 
"R13", "R62", "R22", "R42", "R31", "R52", "R32", "R33", "R41", 
"R43", "R63", "R21", "R51", "R61", "R11", "R12", "R53", "R23", 
"R71"), class = "factor")), .Names = c("x", "y", "label"), row.names = c(NA, 
21L), class = "data.frame"), leaf_labels = NULL, class = "hclust"), .Names = c("segments", 
"labels", "leaf_labels", "class"), class = "dendro")

I add group factors

labs1 <- label(ddata1)
labs1$groups <- as.factor(c("G", "G", "A", "F","B","D","C","E","C","C","D","D","F","B","E","F","A","A","E","B","G"))

The following code

library(ggplot2)
library(scales)
library(ggdendro)
ggplot(segment(ddata1)) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  geom_text(data=label(ddata1), aes(label=label, x=x, y=-0.05, col=labs1$group)) +
  coord_flip() + 
  scale_y_reverse(expand = c(0.2, 0)) +
  theme_bw() +
  theme(axis.text.y=element_blank(),
        legend.text=element_text(size=14, face="bold")) +
  xlab("") +
  ylab("")

produces this plot:

when i add

+
guides(colour = guide_legend("Treatment", override.aes = list(size = 7, shape=19)))

the title is changed, the size of the legend key are changed, but the shape-argument is ignored:

Why is that?

Thank you!


Answer:

To get points in the legend, you need to add a geom_point layer. To prevent it from appearing in your actual plot, you can set alpha = 0 in the geom itself, then set alpha = 1 in the override:

ggplot(segment(ddata1)) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  geom_text(data=label(ddata1), aes(label=label, x=x, y=-0.05, col=labs1$group), show.legend = FALSE) +
  geom_point(data=label(ddata1), aes(x=x, y=-0.05, col=labs1$group), alpha = 0) +
  coord_flip() + 
  scale_y_reverse(expand = c(0.2, 0)) +
  theme_bw() +
  theme(axis.text.y=element_blank(),
        legend.text=element_text(size=14, face="bold")) +
  xlab("") +
  ylab("") +
  guides(colour = guide_legend("Treatment", override.aes = list(size = 7, alpha = 1)))

Question:

My task is to create a dendrogram, but the leaf nodes shows blunt edges. How would I extend the length of the leaf node, and add node labels?

Please see the current and expected images below.

Data:

df1 <- data.frame( z1 = c(rep('P1', 5), rep('P2', 5), rep('P3', 3), rep('P4', 4)),
                   z2 = c(letters[1:5], letters[6:10], letters[11:13], letters[14:17]),
                   stringsAsFactors = FALSE)

Code:

library('data.table')
library('ggplot2')
library('ggdendro')
library('grid')

setDT(df1)
ddata <- dcast( data = df1[, .(z1, z2)],
                formula = z2 ~ z1, 
                fill = 0, 
                fun.aggregate = length, 
                value.var = 'z2')
setDF( ddata)
row.names(ddata) <- ddata$z2
ddata$z2 <- NULL
ddata <- dendro_data( as.dendrogram( hclust( dist( ddata), method = "average")))
p <- ggplot(segment(ddata)) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  theme_dendro()
print(p)

Current plot:

Expected Plot:


Answer:

There are a couple of ways to do this, the simplest is probably to apply a function recursively over the nodes of the dendrogram using dendrapply.

If you insert a new line to assign the dendrogram object:

dendro <-  as.dendrogram(hclust(dist(ddata), method = "average"))

and then create a simple function that reduces the height of leaf nodes by a given amount (d):

dropleaf <- function(x, d = 1){
  if(is.leaf(x)) attr(x, "height") <- attr(x, "height") - d
  return(x)
}

The function can be applied over all nodes as follows:

dendro <- dendrapply(dendro, dropleaf, d = 0.2)

If you intend to plot the axis you can re-scale the plot so that the lowest point is reset to zero using:

dendro <- phylogram::reposition(dendro, shift = "reset")

You can then proceed with the rest of your code..

ddata <- dendro_data(dendro)
p <- ggplot(segment(ddata)) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  theme_dendro()
print(p)

producing the following output: