Hot questions for Using Ggplot2 in ggally

Question:

I am trying to replicate this simple example given in the Coursera R Regression Models course:

require(datasets)
data(swiss)
require(GGally)
require(ggplot2)
ggpairs(swiss, lower = list(continuous = "smooth", params = c(method = "loess")))

I expect to see a 6x6 pairs plot - one scatterplot with loess smoother and confidence intervals for each combination of the 6 variables in the swiss data.

However, I get the following error:

Error in display_param_error() : 'params' is a deprecated argument. Please 'wrap' the function to supply arguments. help("wrap", package = "GGally")

I looked through the ggpairs() and wrap() help files and have tried lots of permutations of the wrap() and wrap_fn_with_param_arg() functions.

I can get this to work as expected:

ggpairs(swiss, lower = list(continuous = wrap("smooth")))

But once I add the loess part in, it does not:

ggpairs(swiss, lower = list(continuous = wrap("smooth"), method = wrap("loess")))

I get this error when I tried the line above.

Error in value[3L] : The following ggpair plot functions are readily available: continuous: c('points', 'smooth', 'density', 'cor', 'blank') combo: c('box', 'dot', 'facethist', 'facetdensity', 'denstrip', 'blank') discrete: c('ratio', 'facetbar', 'blank') na: c('na', 'blank')

diag continuous: c('densityDiag', 'barDiag', 'blankDiag') diag discrete: c('barDiag', 'blankDiag') diag na: c('naDiag', 'blankDiag')

You may also provide your own function that follows the api of function(data, mapping, ...){ . . . } and returns a ggplot2 plot object Ex: my_fn <- function(data, mapping, ...){ p <- ggplot(data = data, mapping = mapping) + geom_point(...) p } ggpairs(data, lower = list(continuous = my_fn))

Function provided: loess

Obviously I am entering loess in the wrong place. Can anyone help me understand how to add the loess part in?

Note that my problem is different to this one, as I am asking how to implement loess in ggpairs since the params argument became deprecated.

Thanks very much.


Answer:

One quick way is to write your own function... the one below was edited from the one provided by the ggpairs error message in your question

library(GGally)
library(ggplot2)    
data(swiss)

# Function to return points and geom_smooth
# allow for the method to be changed
my_fn <- function(data, mapping, method="loess", ...){
      p <- ggplot(data = data, mapping = mapping) + 
      geom_point() + 
      geom_smooth(method=method, ...)
      p
    }

# Default loess curve    
ggpairs(swiss[1:4], lower = list(continuous = my_fn))

# Use wrap to add further arguments; change method to lm
ggpairs(swiss[1:4], lower = list(continuous = wrap(my_fn, method="lm")))


This perhaps gives a bit more control over the arguments that are passed to each geon_

  my_fn <- function(data, mapping, pts=list(), smt=list(), ...){
              ggplot(data = data, mapping = mapping, ...) + 
                         do.call(geom_point, pts) +
                         do.call(geom_smooth, smt) 
                 }

# Plot 
ggpairs(swiss[1:4], 
        lower = list(continuous = 
                       wrap(my_fn,
                            pts=list(size=2, colour="red"), 
                            smt=list(method="lm", se=F, size=5, colour="blue"))))

Question:

Consider this example:

data(tips, package = "reshape")
library(GGally)
pm <- ggpairs(tips, mapping = aes(color = sex), columns = c("total_bill", "time", "tip"))
pm

How do I make the density plots more transparent and remove the black lines?

The GGally packages seems to have changed a lot recently and I cannot find a working solution

update

I found how to change the alpha with a custom function:

my_dens <- function(data, mapping, ..., low = "#132B43", high = "#56B1F7") {
  ggplot(data = data, mapping=mapping) +
    geom_density(..., alpha=0.7) 
}

pm <- ggpairs(tips, mapping = aes(color = sex), columns = c("total_bill", "time", "tip"),
              diag=list(continuous=my_dens))
pm

but the black line still remains.


Answer:

thanks to @Henrik this is the solution using a custom function

my_dens <- function(data, mapping, ...) {
  ggplot(data = data, mapping=mapping) +
    geom_density(..., alpha = 0.7, color = NA) 
}

pm <- ggpairs(tips, mapping = aes(color = sex), columns = c("total_bill", "time", "tip"),
              diag = list(continuous = my_dens))
pm

Examples on how to customize ggpairs plots can be found in the vignette. See the "Matrix Sections" and "Plot Matrix Subsetting".

Question:

The points in ggpairs are way too big. How do I make them smaller?


Answer:

This basically requires reading the help page and working through the examples. Turns out that there are (at least) two different sets of attributes that might affect point size. Below you will see the two that I found.

require(ggplot2)
require(GGally)
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 200), ]

# Custom Example  ( almost directly from help page)
pm <- ggpairs(
 diamonds.samp[, 1:5],
 mapping = ggplot2::aes(color = cut),
 upper = list(continuous = wrap("density", alpha = 0.5), combo = "box"),
 lower = list(continuous = wrap("points", alpha = 0.3,    size=0.1), 
              combo = wrap("dot", alpha = 0.4,            size=0.2) ),
 title = "Diamonds"
)
 pm

Question:

I am creating a scatter plot matrix using GGally::ggpairs. I am using a custom function (below called my_fn) to create the bottom-left non-diagonal subplots. In the process of calling that custom function, there is information about each of these subplots that is calculated, and that I would like to store for later.

In the example below, each h@cID is a int[] structure with 100 values. In total, it is created 10 times in my_fn (once for each of the 10 bottom-left non-diagonal subplots). I am trying to store all 10 of these h@cID structures into the listCID list object.

I have not had success with this approach, and I have tried a few other variants (such as trying to put listCID as an input parameter to my_fn, or trying to return it in the end).

Is it possible for me to store the ten h@cID structures efficiently through my_fn to be used later? I feel there are several syntax issues that I am not entirely familiar with that may explain why I am stuck, and likewise I would be happy to change the title of this question if I am not using appropriate terminology. Thank you!

library(hexbin)
library(GGally)
library(ggplot2)

set.seed(1)

bindata <- data.frame(
    ID = paste0("ID", 1:100), 
    A = rnorm(100), B = rnorm(100), C = rnorm(100), 
    D = rnorm(100), E = rnorm(100))
    bindata$ID <- as.character(bindata$ID
)

maxVal <- max(abs(bindata[ ,2:6]))
maxRange <- c(-1 * maxVal, maxVal)

listCID <- c()

my_fn <- function(data, mapping, ...){
  x <- data[ ,c(as.character(mapping$x))]
  y <- data[ ,c(as.character(mapping$y))]
  h <- hexbin(x=x, y=y, xbins=5, shape=1, IDs=TRUE, 
              xbnds=maxRange, ybnds=maxRange)
  hexdf <- data.frame(hcell2xy(h),  hexID=h@cell, counts=h@count)
  listCID <- c(listCID, h@cID)
  print(listCID)
  p <- ggplot(hexdf, aes(x=x, y=y, fill=counts, hexID=hexID)) + 
            geom_hex(stat="identity")
  p
}

p <- ggpairs(bindata[ ,2:6], lower=list(continuous=my_fn))
p


Answer:

If I understand your problem correctly this is quite easily, albeit inelegantly, achieved using the <<- operator.

With it you may assign something like a global variable from inside the scope of your function.

Set listCID <- NULL before executing the function and listCID <<-c(listCID,h@cID) inside the function.

listCID = NULL

my_fn <- function(data, mapping, ...){
  x = data[,c(as.character(mapping$x))]
  y = data[,c(as.character(mapping$y))]
  h <- hexbin(x=x, y=y, xbins=5, shape=1, IDs=TRUE, xbnds=maxRange, ybnds=maxRange)
  hexdf <- data.frame (hcell2xy (h),  hexID = h@cell, counts = h@count)

  if(exists("listCID")) listCID <<-c(listCID,h@cID)

  print(listCID)
  p <- ggplot(hexdf, aes(x=x, y=y, fill = counts, hexID=hexID)) + geom_hex(stat="identity")
  p
    }

For more on scoping please refer to Hadleys excellent Advanced R: http://adv-r.had.co.nz/Environments.html

Question:

Given a fresh session, executing a small ggparcoord(.) example provided in the documentation of the function

library(GGally)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))

results into the following plot:

Again, starting in a fresh session and executing the same script with the loaded dplyr

library(GGally)
library(dplyr)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))

results in:

Error: (list) object cannot be coerced to type 'double'

Note that the order of the library(.) statements does not matter.

Questions

  1. Is there something wrong with the code samples?
  2. Is there a way to overcome the problem (over some namespace functions)?
  3. Or is this a bug?

I need both dplyr and ggparcoord(.) in a bigger analysis but this minimal example reflects the problem i am facing.

Versions

  • R @ 3.2.3
  • dplyr @ 0.4.3
  • GGally @ 1.0.1
  • ggplot @ 2.0.0

UPDATE

To wrap the excellent answer given by Joran up:

Answers

  1. The code samples are in fact wrong as ggparcoord(.) expects a data.frame not a tbl_df as given by the diamonds data set (if dplyr is loaded).
  2. The problem is solved by coercing the tbl_df to a data.frame.
  3. No it is not a bug.

Working code sample:

library(GGally)
library(dplyr)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = as.data.frame(diamonds.samp), columns = c(1, 5:10))

Answer:

Converting my comments to an answer...

The GGally package here is making the reasonable assumption that using [ on a data frame should behave the way it always does and always has. However, this all being in the Hadley-verse, the diamonds data set is a tbl_df as well as a data.frame.

When dplyr is loaded, the behavior of [ is overridden such that drop = FALSE is always the default for a tbl_df. So there's a place in GGally where data[,"cut"] is expected to return a vector, but instead it returns another data frame.

...specifically, the error is thrown in your example while attempting to execute:

data[, fact.var] <- as.numeric(data[, fact.var]). 

Since data[,fact.var] remains a data frame, and hence a list, as.numeric won't work.

As for your conclusion that this isn't a bug, I'd say....maybe. Probably. At least there probably isn't anything the GGally package author ought to do to address it. You just have to be aware that using tbl_df's with non-Hadley written packages may break things.

As you noted, removing the extra class attributes fixes the problem, as it returns R to using the normal [ method.

Question:

I have a pretty dumb question to ask everyone.

I am using ggpairs under GGally to create a correlation matrix, and somehow I found that GGally did not provide a saving function as ggplot2 did. The function ggsave did not work for a non-ggplot2 object. I tried to use pdf or png, but they did not work. I am wondering if there's an easy to save this picture to a local file? Thank you for your kind help.


Answer:

While @CMichael's comment is nice (I didn't know that, hence +1), it's applicable only if you want to save a particular plot from GGally-generated plot matrix. I believe that you'd like to save the whole plot matrix - the need, which I've recently also experienced. Therefore, you can use a standard R approach and save the graphics by opening corresponding (to desired format) graphical device, printing the object and closing the device, which will effectively save the graphics in a desired format.

# use pdf() instead of svg(), if you want PDF output
svg("myPlotMatrix.svg", height = 7, width = 7)
g <- ggpairs(...)
print(g)
dev.off()

Question:

With

library(GGally)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1],200),]

# Custom Example
ggpairs(
 diamonds.samp[,1:5],
 mapping = ggplot2::aes(color = cut),
 upper = list(continuous = wrap("density", alpha = 0.5), combo = "box"),
 lower = list(continuous = wrap("points", alpha = 0.3), combo = wrap("dot", alpha = 0.4)),
 diag = list(continuous = wrap("densityDiag")),
 title = "Diamonds"
)

I get

How do I make the diagonal density plots to not be filled, and only show the lines?

Kind of works... but not really.

This is really ugly - in terms of code - because it makes no real sense to me. Also, it does not work here, because it changes the histograms as well.

ggpairs(
  diamonds.samp[,1:5],
  mapping = ggplot2::aes(color = cut),
  upper = list(continuous = wrap("density", alpha = 0.5), combo = "box"),
  lower = list(continuous = wrap("points", alpha = 0.3), combo = wrap("dot", alpha = 0.4)),
  diag = list(continuous = wrap("densityDiag"), mapping = ggplot2::aes(fill=carat)),
  title = "Diamonds"
)


Answer:

The answer to the question can be found on https://cran.r-project.org/web/packages/GGally/vignettes/ggpairs.html (archived here)

ggally_mysmooth <- function(data, mapping, ...){
  ggplot(data = data, mapping=mapping) +
    geom_density(mapping = aes_string(color="cut"), fill=NA)
}
ggpairs(
  diamonds.samp[,1:5],
  mapping = aes(color = cut),
  upper = list(continuous = wrap("density", alpha = 0.5), combo = "box"),
  lower = list(continuous = wrap("points", alpha = 0.3), combo = wrap("dot", alpha = 0.4)),
  diag = list(continuous = ggally_mysmooth),
  title = "Diamonds"
)

Question:

Do you know how to change the labels in an upper panel in ggpairs (Ggally package)? I found how to change size, font but not label. Here I want to shorten the label ("set" pour setosa etc...). I tried to put that in labels=c("set", "ver", "vir") or upper=list(params=list(size=8),labels=c("set", "ver", "vir")) but it doesn't work.

ggpairs(iris, columns=c(2:4), title="variable analysis", colour="Species",
        lower=list(params=list(size=2)), upper=list(params=list(size=8))) 


Answer:

Conceptually the same as @Mike's solution, but in one line.

levels(iris$Species) <- c("set", "ver", "vir")
ggplairs(<...>)

Here's another, more flexible proposal if you have many levels and do not want to abbreviate them by hand: trim levels to a desired length.

levels(iris$Species) <- strtrim(levels(iris$Species), 3)
ggplairs(<...>)

And by the way, the width parameter is also vectorized:

rm(iris)
strtrim(levels(iris$Species), c(1, 3, 5))
#[1] "s"     "ver"   "virgi"

Question:

I used ggpairs to generate this plot:

And this is the code for it:

#load packages
library("ggplot2")
library("GGally")
library("plyr")
library("dplyr")
library("reshape2")
library("tidyr")


#generate example data
dat <- data.frame(replicate(6, sample(1:5, 100, replace=TRUE)))
dat[,1]<-as.numeric(dat[,1])
dat[,2]<-as.numeric(dat[,2])
dat[,3]<-as.numeric(dat[,3])
dat[,4]<-as.numeric(dat[,4])
dat[,5]<-as.numeric(dat[,5])
dat[,6]<-as.numeric(dat[,6])

#ggpairs-plot
main<-ggpairs(data=dat, 
              lower=list(continuous="smooth", params=c(colour="blue")),
              diag=list(continuous="bar", params=c(colour="blue")), 
              upper=list(continuous="cor",params=c(size = 6)), 
              axisLabels='show',
              title="correlation-matrix",
              columnLabels = c("Item 1", "Item 2", "Item 3","Item 4", "Item 5", "Item 6")) +  theme_bw() +
  theme(legend.position = "none", 
        panel.grid.major = element_blank(), 
        axis.ticks = element_blank(), 
        panel.border = element_rect(linetype = "dashed", colour = "black", fill = NA))
main
However, my goal is, to get a plot like this:

This plot is an example and i produced it with the following three ggplot-codes.

I used this for the geom_point plot:

#------------------------
#lower / geom_point with jitter
#------------------------

#dataframe 
df.point <- na.omit(data.frame(cbind(x=dat[,1], y=dat[,2])))

#plot
scatter <- ggplot(df.point,aes(x, y)) +
  geom_jitter(position = position_jitter(width = .25, height= .25)) +
  stat_smooth(method="lm", colour="black") +
  theme_bw() + 
  scale_x_continuous(labels=NULL, breaks = NULL) +
  scale_y_continuous(labels=NULL, breaks = NULL) +
  xlab("") +ylab("")
scatter

this gives the following plot:

I used this for the Barplot:

#-------------------------
#diag. / BARCHART
#------------------------

bar.df<-as.data.frame(table(dat[,1],useNA="no"))

#Barplot
bar<-ggplot(bar.df) + geom_bar(aes(x=Var1,y=Freq),stat="identity") +
  theme_bw() + 
  scale_x_discrete(labels=NULL, breaks = NULL) +
  scale_y_continuous(labels=NULL, breaks = NULL, limits=c(0,max(bar.df$Freq*1.05))) +
  xlab("") +ylab("")
bar

This gives the following plot:

And i used this for the Correlation-Coefficients:

#----------------------
#upper / geom_tile and geom_text
#------------------------

#correlations
df<-na.omit(dat)
df <- as.data.frame((cor(df[1:ncol(df)]))) 
df <- data.frame(row=rownames(df),df) 
rownames(df) <- NULL 

#Tile to plot (as example)
test<-as.data.frame(cbind(1,1,df[2,2])) #F09_a x F09_b
colnames(test)<-c("x","y","var")

#Plot
tile<-ggplot(test,aes(x=x,y=y)) +
  geom_tile(aes(fill=var)) +
  geom_text(data=test,aes(x=1,y=1,label=round(var,2)),colour="White",size=10,show_guide=FALSE) +
  theme_bw() + 
  scale_y_continuous(labels=NULL, breaks = NULL) +
  scale_x_continuous(labels=NULL, breaks = NULL) +
  xlab("") +ylab("") + theme(legend.position = "none")
tile

This gives the following Plot:

My question is: What is the best way to get the plot, that i want? I want to visualise likert-items from a questionnaire and in my opinion, this is a very nice way to do this. Is it possible to use ggpairs for this without producing every plot on his own, like i did with the custumized ggpairs-plot. Or is there another way to do this?


Answer:

I don't know about being the best way, it's certainly not easier, but this generates three lists of plots: one each for the bar plots, the scatterplots, and the tiles. Using gtable functions, it creates a gtable layout, adds the plots to the layout, and follows up with a bit of fine-tuning.

EDIT: Add t and p.values to the tiles.

# Load packages
library(ggplot2)
library(plyr)
library(gtable)
library(grid)


# Generate example data
dat <- data.frame(replicate(10, sample(1:5, 200, replace = TRUE)))
dat = dat[, 1:6]
dat <- as.data.frame(llply(dat, as.numeric))


# Number of items, generate labels, and set size of text for correlations and item labels
n <- dim(dat)[2]
labels <- paste0("Item ", 1:n)
sizeItem = 16
sizeCor = 4


## List of scatterplots
scatter <- list()

for (i in 2:n) {
   for (j in 1:(i-1)) {

# Data frame 
df.point <- na.omit(data.frame(cbind(x = dat[ , j], y = dat[ , i])))

# Plot
p <- ggplot(df.point, aes(x, y)) +
   geom_jitter(size = .7, position = position_jitter(width = .2, height= .2)) +
   stat_smooth(method="lm", colour="black") +
   theme_bw() + theme(panel.grid = element_blank())

name <- paste0("Item", j, i)
scatter[[name]] <- p
} }


## List of bar plots
bar <- list()
for(i in 1:n) {

# Data frame
bar.df <- as.data.frame(table(dat[ , i], useNA = "no"))
names(bar.df) <- c("x", "y")

# Plot
p <- ggplot(bar.df) + 
   geom_bar(aes(x = x, y = y), stat = "identity", width = 0.6) +
   theme_bw() +  theme(panel.grid = element_blank()) +
   ylim(0, max(bar.df$y*1.05)) 

name <- paste0("Item", i)
bar[[name]] <- p
}


## List of tiles
tile <- list()

for (i in 1:(n-1)) {
   for (j in (i+1):n) {

# Data frame 
df.point <- na.omit(data.frame(cbind(x = dat[ , j], y = dat[ , i])))

x = df.point[, 1]
y = df.point[, 2]
correlation = cor.test(x, y)
cor <- data.frame(estimate = correlation$estimate,
                  statistic = correlation$statistic,
                  p.value = correlation$p.value)
cor$cor = paste0("r = ", sprintf("%.2f", cor$estimate), "\n", 
                 "t = ", sprintf("%.2f", cor$statistic), "\n",
                 "p = ", sprintf("%.3f", cor$p.value))


# Plot
p <- ggplot(cor, aes(x = 1, y = 1)) +
  geom_tile(fill = "steelblue") +
  geom_text(aes(x = 1, y = 1, label = cor),
     colour = "White", size = sizeCor, show_guide = FALSE) +
  theme_bw() + theme(panel.grid = element_blank()) 

name <- paste0("Item", j, i)
tile[[name]] <- p
} }


# Convert the ggplots to grobs, 
# and select only the plot panels
barGrob <- llply(bar, ggplotGrob)
barGrob <- llply(barGrob, gtable_filter, "panel")

scatterGrob <- llply(scatter, ggplotGrob)
scatterGrob <- llply(scatterGrob, gtable_filter, "panel")

tileGrob <- llply(tile, ggplotGrob)
tileGrob <- llply(tileGrob, gtable_filter, "panel")


## Set up the gtable layout
gt <- gtable(unit(rep(1, n), "null"), unit(rep(1, n), "null"))


## Add the plots to the layout
# Bar plots along the diagonal
for(i in 1:n) {
gt <- gtable_add_grob(gt, barGrob[[i]], t=i, l=i)
}

# Scatterplots in the lower half
k <- 1
for (i in 2:n) {
   for (j in 1:(i-1)) {
gt <- gtable_add_grob(gt, scatterGrob[[k]], t=i, l=j)
k <- k+1
} }

# Tiles in the upper half
k <- 1
for (i in 1:(n-1)) {
   for (j in (i+1):n) {
gt <- gtable_add_grob(gt, tileGrob[[k]], t=i, l=j)
k <- k+1
} }


# Add item labels
gt <- gtable_add_cols(gt, unit(1.5, "lines"), 0)
gt <- gtable_add_rows(gt, unit(1.5, "lines"), 2*n)

for(i in 1:n) {
textGrob <- textGrob(labels[i], gp = gpar(fontsize = sizeItem)) 
gt <- gtable_add_grob(gt, textGrob, t=n+1, l=i+1)
}

for(i in 1:n) {
textGrob <- textGrob(labels[i], rot = 90, gp = gpar(fontsize = sizeItem)) 
gt <- gtable_add_grob(gt, textGrob, t=i, l=1)
}


# Add small gap between the panels
for(i in n:1) gt <- gtable_add_cols(gt, unit(0.2, "lines"), i)
for(i in (n-1):1) gt <- gtable_add_rows(gt, unit(0.2, "lines"), i)


# Add chart title
gt <- gtable_add_rows(gt, unit(1.5, "lines"), 0)
textGrob <- textGrob("Korrelationsmatrix", gp = gpar(fontface = "bold", fontsize = 16)) 
gt <- gtable_add_grob(gt, textGrob, t=1, l=3, r=2*n+1)


# Add margins to the whole plot
for(i in c(2*n+1, 0)) {
gt <- gtable_add_cols(gt, unit(.75, "lines"), i)
gt <- gtable_add_rows(gt, unit(.75, "lines"), i)
}


# Draw it
grid.newpage()
grid.draw(gt)

Question:

I have the following code:

library(GGally)
library(nycflights13)
library(tidyverse)

dat <- nycflights13::flights %>% 
       select(dep_time, sched_dep_time, dep_delay,  arr_time, sched_arr_time, arr_delay)  %>% 
       sample_frac(0.01)
dat
ggpairs(dat)

It produces this:

How can I add the density coloring so that it looks like this:


Answer:

Using ideas from How to reproduce smoothScatter's outlier plotting in ggplot? , R - Smoothing color and adding a legend to a scatterplot, and How to use loess method in GGally::ggpairs using wrap function you can define your own function to pass to ggpairs.

my_fn <- function(data, mapping, ...){
      p <- ggplot(data = data, mapping = mapping) + 
        stat_density2d(aes(fill=..density..), geom="tile", contour = FALSE) +
        scale_fill_gradientn(colours=rainbow(100))
      p
}

ggpairs(dat, lower=list(continuous=my_fn))

EDIT

From comment: How do you add histogram in the diagonal and remove "Corr:" in the correlation value?

You can set the diagonal and upper arguments. So to add the histogram (assuming you mean geom_histogram) you can use diag=list(continuous=wrap("barDiag", binwidth=100)) and to remove the correlation completely use upper=list(continuous="blank"). If you want to actually remove the text *corr:*, you will need to define a new function - please see the function cor_fun at Change colors in ggpairs now that params is deprecated .

So your plot becomes

ggpairs(dat, lower=list(continuous=my_fn),
        diag=list(continuous=wrap("barDiag", binwidth=100)),
        upper=list(continuous=wrap(cor_fun, sz=10, stars=FALSE))
        )

EDIT

From comment: How do you color the diagonal histogram like in OP?

To colour just add the relevant arguments to the barDiag function, in this case fill and colour. So diag would then be

diag=list(continuous=wrap("barDiag", binwidth=100, fill="brown", col="black")) 

(fill gives the main colour, and col gives the colour to outline the bars)

Question:

I am trying to create a list object that contains GGally plots. These plots are each created with two datasets, the main dataset and a subset of the main dataset to be plotted again in orange. In the MWE below, three plots are created, each comparing two columns from the mtcars data and each containing a different number of subset points to be plotted in orange:

Plot_1: mpg and cyl, 1 orange overlaid point

Plot_2: mpg and disp, 20 orange overlaid points

Plot_3: mpg and hp, 30 orange overlaid points

library(GGally)
library(ggplot2)

data = mtcars
data$ID = rownames(mtcars)
data = data[, c(12,1:11)]

  my_fn <- function(data, mapping, ...){
    xChar = as.character(mapping$x)
    yChar = as.character(mapping$y)
    x = data[,c(xChar)]
    y = data[,c(yChar)]
    p <- ggplot(data, aes(x=x, y=y)) + geom_point() + geom_point(data = colorData, aes_string(x=xChar, y=yChar), inherit.aes = FALSE)
    p
  }

  ret=list()
  colorVec = c(1, 10, 20)
  k=1
    for (j in c(3:5)){
      datSel <- cbind(ID=data$ID, data[,c(2, j)])
      datSel$ID = as.character(datSel$ID)
      colorData <- datSel[sample(1:nrow(data), colorVec[k]),]
      p <- ggpairs(datSel[,-1], lower = list(continuous = my_fn), upper = list(continuous = wrap("cor", size = 4))) + theme_gray()
      ret[[paste0("Plot_",j)]] <- p
      k=k+1
    }  

However, when I run this code, and create the ret list object, only the last plot object in the list successfully creates the plot. The first two list objects cannot find one of the columns in the data.

> ret[["Plot_1"]]
Error in FUN(X[[i]], ...) : object 'cyl' not found

> ret[["Plot_2"]]
Error in FUN(X[[i]], ...) : object 'disp' not found

> ret[["Plot_3"]]
Correctly plotted

What might be a painless way to fix this problem? Thank you in advance for sharing advice.

EDIT:

Adding session info for reproduciblity

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_2.2.1 GGally_1.3.2 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15       reshape_0.8.7      grid_3.4.3         plyr_1.8.4         gtable_0.2.0      
 [6] magrittr_1.5       scales_0.5.0       pillar_1.2.1       stringi_1.1.6      rlang_0.2.0       
[11] reshape2_1.4.3     lazyeval_0.2.1     labeling_0.3       RColorBrewer_1.1-2 tools_3.4.3       
[16] stringr_1.3.0      munsell_0.4.3      yaml_2.1.17        compiler_3.4.3     colorspace_1.3-2  
[21] tibble_1.4.2

Answer:

A possible solution, if I correctly understood your question :

library(GGally)
data = mtcars
data$ID = rownames(mtcars)
data = data[, c(12,1:11)]

# Load tidyverse
library(tidyverse)

# Create a vector list for each plot you want
var_list <- data.frame(var = names(data)[3:5], 
                   color = colorVec)

# Function for sampling orange points
my_color_fn <- function(data, color_nb) {
  sample(1:nrow(data), color_nb)
}

# Create a list with a data for each variable with colors
data_list <- apply(var_list, 1, 
                   function(x) 
                     data %>% 
                      select(ID, mpg, as.character(x[["var"]])) %>% 
                      mutate(color = "black") %>% 
                      mutate(color = replace(color, my_color_fn(., x[["color"]]), "orange")))

# Update my_fn function
my_fn <- function(data, mapping, ...){
  xChar = as.character(mapping$x)
  yChar = as.character(mapping$y)
  x = data[, c(xChar)]
  y = data[, c(yChar)]
  p <- ggplot(data, aes_string(x=x, y=y)) + 
    geom_point(aes(color = color)) + 
    scale_color_manual("", values = c("black" = "black",
                                      "orange" = "orange"))
  p
}

# Create a function to get ggpairs for each subset
my_fn2 <- function(data)
{
  p <- ggpairs(data %>% select(- ID), 1:2, 
               lower = list(continuous = my_fn), 
               upper = list(continuous = wrap("cor", size = 4)))
  return(p)
}

# Get plot for each list element
ret <- lapply(data_list, function(x) my_fn2(x))

ret[[1]]
ret[[2]]
ret[[3]]

Question:

tl;dr can't get a standalone legend (describing common colours across the whole plot) in ggpairs to my satisfaction.

Sorry for length.

I'm trying to draw a (lower-triangular) pairs plot using GGally::ggpairs (an extension package for drawing various kinds of plot matrices with ggplot2). This is essentially the same question as How to add an external legend to ggpairs()? , but I'm not satisfied with the answer to that question aesthetically, so I'm posting this as an extension (if suggested/recommended by commenters, I will delete this question and offer a bounty on that question instead). In particular, I would like the legend to appear outside the sub-plot frame, either putting it within one virtual subplot but allowing additional width to hold it, or (ideally) putting it in a separate (empty) subplot. As I show below, both of my partial solutions have problems.

Fake data:

set.seed(101)
dd <- data.frame(x=rnorm(100),
                 y=rnorm(100),
                 z=rnorm(100),
                 f=sample(c("a","b"),size=100,replace=TRUE))
library(GGally)

Base plot function:

ggfun <- function(...) {
   ggpairs(dd,mapping = ggplot2::aes(color = f),
    columns=1:3,
    lower=list(continuous="points"),
    diag=list(continuous="blankDiag"),
    upper=list(continuous="blank"),
    ...)
}

Function to trim top/right column:

trim_gg <- function(gg) {
    n <- gg$nrow
    gg$nrow <- gg$ncol <- n-1
    v <- 1:n^2
    gg$plots <- gg$plots[v>n & v%%n!=0]
    gg$xAxisLabels <- gg$xAxisLabels[-n]
    gg$yAxisLabels <- gg$yAxisLabels[-1]
    return(gg)
}

gg0 <- trim_gg(ggfun(legends=TRUE))

Get rid of legends in left column (as in the linked question above):

library(ggplot2)  ## for theme()
for (i in 1:2) {
   inner <- getPlot(gg0,i,1)
   inner <- inner + theme(legend.position="none")
   gg0 <- putPlot(gg0,inner,i,1)
}
inner <- getPlot(gg0,2,2)
inner <- inner + theme(legend.position="right")
gg0 <- putPlot(gg0,inner,2,2)

Problems:

  • the blank panel behind the legend is actually masking some points; I don't know why it's not outside the panel as usual, I assume that's something that ggpairs is doing
  • if it were outside the panel (on top or to the right), I would want to make sure to leave some extra space so the panels themselves were all the same size. However, ggmatrix/ggpairs looks very inflexible about this.

The only alternative I've been able to try to far is following ggplot separate legend and plot by extracting the legend and using gridExtra::grid.arrange():

g_legend <- function(a.gplot){
   tmp <- ggplot_gtable(ggplot_build(a.gplot))
   leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
   legend <- tmp$grobs[[leg]]
   return(legend)
}

library(gridExtra)
grid.arrange(getPlot(gg0,1,1),
             g_legend(getPlot(gg0,2,2)),
             getPlot(gg0,2,1),
             getPlot(gg0,2,2)+theme(legend.position="none"),
   nrow=2)

Problems:

  • the axes and labels suppressed by ggpairs are back ...

I also considered creating a panel with a special plot that contained only the legend (i.e. trying to use theme(SOMETHING=element.blank) to suppress the plot itself, but couldn't figure out how to do it.

As a last resort, I could trim the axes where appropriate myself, but this is practically reinventing what ggpairs is doing in the first place ...


Answer:

With some slight modification to solution 1: First, draw the matrix of plots without their legends (but still with the colour mapping). Second, use your trim_gg function to remove the diagonal spaces. Third, for the plot in the top left position, draw its legend but position it into the empty space to the right.

data(state)
dd <- data.frame(state.x77,
             State = state.name,
             Abbrev = state.abb,
             Region = state.region,
             Division = state.division) 

columns <- c(3, 5, 6, 7)
colour <- "Region"

library(GGally)
library(ggplot2)  ## for theme()

# Base plot
ggfun <- function(data = NULL, columns = NULL, colour = NULL, legends = FALSE) {
   ggpairs(data, 
     columns = columns,
     mapping = ggplot2::aes_string(colour = colour),
     lower = list(continuous = "points"),
     diag = list(continuous = "blankDiag"),
     upper = list(continuous = "blank"),
    legends = legends)
}

# Remove the diagonal elements
trim_gg <- function(gg) {
    n <- gg$nrow
    gg$nrow <- gg$ncol <- n-1
    v <- 1:n^2
    gg$plots <- gg$plots[v > n & v%%n != 0]
    gg$xAxisLabels <- gg$xAxisLabels[-n]
    gg$yAxisLabels <- gg$yAxisLabels[-1]
    return(gg)
}

# Get the plot
gg0 <- trim_gg(ggfun(dd, columns, colour))

# For plot in position (1,1), draw its legend in the empty panels to the right
inner <- getPlot(gg0, 1, 1)

inner <- inner + 
   theme(legend.position = c(1.01, 0.5), 
         legend.direction = "horizontal",
         legend.justification = "left") +
   guides(colour = guide_legend(title.position = "top"))  

gg0 <- putPlot(gg0, inner, 1, 1)
gg0

Question:

I'm trying to reproduce the figure in https://tgmstat.wordpress.com/2013/11/13/plot-matrix-with-the-r-package-ggally/ with the code

require(GGally)
data(tips, package="reshape")
ggpairs(data=tips, title="tips data", colour = "sex") 

However, in the plot I get the points are not colored based on sex, instead they are all the same color. I get the following warning

Warning message: In warn_if_args_exist(list(...)) : Extra arguments: 'colour' are being ignored. If these are meant to be >aesthetics, submit them using the 'mapping' variable within ggpairs with >ggplot2::aes or ggplot2::aes_string.

I've tried adding ggplot2::aes(colour = sex), but that did not work either.

Does anyone else here have the same problem? I'm using R version 3.3.1 and GGally_1.2.0.

Thanks.


Answer:

GGally has been under fairly rapid development, so it's not surprising that a blog post from 2013 has out-of-date code. When I run your code with GGally 1.2.0 I get the same warning. It works for me if I add the mapping:

require(GGally)
data(tips, package="reshape")
g1 <- ggpairs(data=tips, title="tips data",
  mapping=ggplot2::aes(colour = sex),
  lower=list(combo=wrap("facethist",binwidth=1)))

Following the wiki page for the wrap() incantation to stop complaints about needing to set binwidth in stat_bin ...

Question:

My question is twofold;

I have a ggpairs plot with the default upper = list(continuous = cor) and I would like to colour the tiles by correlation values (exactly like what ggcorr does).

I have this: I would like the correlation values of the plot above to be coloured like this:

library(GGally)

sample_df <- data.frame(replicate(7,sample(0:5000,100)))
colnames(sample_df) <- c("KUM", "MHP", "WEB", "OSH", "JAC", "WSW", "gaugings")

ggpairs(sample_df, lower = list(continuous = "smooth"))  
ggcorr(sample_df, label = TRUE, label_round = 2)

I had a brief go at trying to use upper = list(continuous = wrap(ggcorr) but didn't have any luck and, given that both functions return plot calls, I don't think that's the right path?

I am aware that I could build this in ggplot (e.g. Sandy Muspratt's solution) but given that the GGally package already has the functionality I am looking for I thought I might be overlooking something.


More broadly, I would like to know how we, or if we can, call the correlation values? A simpler option may be to colour the labels rather than the tile (i.e. this question using colour rather than size) but I need a variable to assign to colour...

Being able to call the correlation values to use in other plots would be handy although I suppose I could just recalculate them myself.

Thank you!


Answer:

A possible solution is to get the list of colors from the ggcorr correlation matrix plot and to set these colors as background in the upper tiles of the ggpairs matrix of plots.

library(GGally)   
library(mvtnorm)
# Generate data
set.seed(1)
n <- 100
p <- 7
A <- matrix(runif(p^2)*2-1, ncol=p) 
Sigma <- cov2cor(t(A) %*% A)
sample_df <- data.frame(rmvnorm(n, mean=rep(0,p), sigma=Sigma))
colnames(sample_df) <- c("KUM", "MHP", "WEB", "OSH", "JAC", "WSW", "gaugings")

# Matrix of plots
p1 <- ggpairs(sample_df, lower = list(continuous = "smooth"))  
# Correlation matrix plot
p2 <- ggcorr(sample_df, label = TRUE, label_round = 2)

The correlation matrix plot is:

# Get list of colors from the correlation matrix plot
library(ggplot2)
g2 <- ggplotGrob(p2)
colors <- g2$grobs[[6]]$children[[3]]$gp$fill

# Change background color to tiles in the upper triangular matrix of plots 
idx <- 1
for (k1 in 1:(p-1)) {
  for (k2 in (k1+1):p) {
    plt <- getPlot(p1,k1,k2) +
     theme(panel.background = element_rect(fill = colors[idx], color="white"),
           panel.grid.major = element_line(color=colors[idx]))
    p1 <- putPlot(p1,plt,k1,k2)
    idx <- idx+1
}
}
print(p1)

Question:

I saw these posts GGally::ggpairs plot without gridlines when plotting correlation coefficient use ggpairs to create this plot

After reading I was able to implement this hack https://github.com/tonytonov/ggally/blob/master/R/gg-plots.r and my plot looks like this

I think this is a good result but I cannot change the colors.

A MWE is this

library(ggally)

# load the hack
source("ggally_mod.R") 
# I saved https://github.com/tonytonov/ggally/blob/master/R/gg-plots.r as "ggally_mod.R"
assignInNamespace("ggally_cor", ggally_cor, "GGally")

ggpairs(swiss)

Now I want to run

ggpairs(swiss, 
 lower=list(continuous="smooth", wrap=c(colour="blue")),
 diag=list(continuous="bar", wrap=c(colour="blue")))

But the colours remain the same. Is there a way to change the colours now that params is not working anymore?


Answer:

You are not using wrap correctly - see the vignette for details. Also for the diagonal you now have to use the function barDiag (but ggpairs gives very helpful errors to tell this)

So for your example, we can change the colour of the points in the lower panels and the fill of the bars below

library(GGally)
library(ggplot2)
ggpairs(swiss[1:3], 
        lower=list(continuous=wrap("smooth", colour="blue")),
        diag=list(continuous=wrap("barDiag", fill="blue")))

However, as the colour of the smooth is hard coded (see ggally_smooth), to change its colour you need to define you own function to pass. So from here

my_fn <- function(data, mapping, pts=list(), smt=list(), ...){
              ggplot(data = data, mapping = mapping, ...) + 
                         do.call(geom_point, pts) +
                         do.call(geom_smooth, smt) 
                 }

# Plot 
ggpairs(swiss[1:4], 
        lower = list(continuous = 
                       wrap(my_fn,
                            pts=list(size=2, colour="red"), 
                            smt=list(method="lm", se=F, size=5, colour="blue"))),
                     diag=list(continuous=wrap("barDiag", fill="blue")))

In a similar way, here is a way to define a new upper correlation function (similar to what you have)

cor_fun <- function(data, mapping, method="pearson", ndp=2, sz=5, stars=TRUE, ...){

    x <- eval_data_col(data, mapping$x)
    y <- eval_data_col(data, mapping$y)

    corr <- cor.test(x, y, method=method)
    est <- corr$estimate
    lb.size <- sz* abs(est) 

    if(stars){
      stars <- c("***", "**", "*", "")[findInterval(corr$p.value, c(0, 0.001, 0.01, 0.05, 1))]
      lbl <- paste0(round(est, ndp), stars)
    }else{
      lbl <- round(est, ndp)
    }

    ggplot(data=data, mapping=mapping) + 
      annotate("text", x=mean(x, na.rm=TRUE), y=mean(y, na.rm=TRUE), label=lbl, size=lb.size,...)+
      theme(panel.grid = element_blank())
  }


ggpairs(swiss, 
        lower=list(continuous=wrap("smooth", colour="blue")),
        diag=list(continuous=wrap("barDiag", fill="blue")),
        upper=list(continuous=cor_fun))

Question:

I made a scattermatrix with the ggplot2 extension GGally with the following code

  ggscatmat(dat2, columns = 2:6, color="car", alpha=0.8) +
  ggtitle("Korrelation") + 
  theme(axis.text.x = element_text(angle=-40, vjust=1, hjust=0, size=10))

Now my problem is that in this case I don't really need the density lineplot or the correlation coeff., I do only want the scatterplots in the matrix. Is there a way to "delete" the other facets? I can#T find anything in the documentation.

Please excuse my bad english and thanks for any advice or help!

Edit: I found a not yet perfect solution with ggpairs:

ggpairs(dat2, columns = 2:6, mapping= aes(color=car), 
        upper = "blank",diag = "blank") +
  theme(axis.text.x = element_text(angle=-40, vjust=1, hjust=0, size=10))

But now there's no legend anymore and two labels looking like the plot hasn't fully loaded yet:


Answer:

You can manually remove parts of the plot by messing about with the gtable

 removePanels <- function(plot) {

         g <-  ggplotGrob(plot)

         # get panels to remove: upper + diagonal
         ids <- grep("panel", g$layout$name)
         cols <- sqrt(diff(range(ids)) +1)
         remove <- matrix(ids, ncol=cols)
         remove <- remove[upper.tri(remove, diag=TRUE)]

         # remove certain axis
         yax <- grep("axis-l", g$layout$name)[1] # first
         xax <- tail(grep("axis-b", g$layout$name), 1) #last

         # remove cetain strips
        ystrip <- grep("strip-right", g$layout$name)[1]
        xstrip <- tail(grep("strip-top", g$layout$name), 1)

        # remove grobs
        g$grobs[c(remove, xax, yax, ystrip, xstrip)] <- NULL
        g$layout <- g$layout[-c(remove, xax, yax, ystrip, xstrip),]
        g
      }

# draw
library(GGally)
library(ggplot2)
library(grid)

p <- ggscatmat(iris, columns = 1:4, color="Species", alpha=0.8) +
          theme(axis.text.x = element_text(angle=-40, vjust=1, hjust=0, size=10))

grid.newpage()      
grid.draw(removePanels(p))