Hot questions for Using Ggplot2 in ggpairs

Question:

I am using ggpairs to make a pairs plot, but I only want to display the lower triangle. I can make the diagonal and upper triangle blank, but cannot make them go, which leaves an empty row and an empty column which I don't want.

Any suggestions?

library("GGally")
ggpairs(iris[, 1:4], 
        lower  = list(continuous = "points"),
        upper  = list(continuous = "blank"),
        diag  = list(continuous = "blankDiag")
        )


Answer:

The ggpairs object can be edited. The bulk of the object is list of plots. The unwanted plots can be removed from this list and the other elements of the ggpairs object modified to match.

Here is a function that will do this

gpairs_lower <- function(g){
  g$plots <- g$plots[-(1:g$nrow)]
  g$yAxisLabels <- g$yAxisLabels[-1]
  g$nrow <- g$nrow -1

  g$plots <- g$plots[-(seq(g$ncol, length(g$plots), by = g$ncol))]
  g$xAxisLabels <- g$xAxisLabels[-g$ncol]
  g$ncol <- g$ncol - 1

  g
}

library("GGally")
g <- ggpairs(iris[, 1:4], 
             lower  = list(continuous = "points"),
             upper  = list(continuous = "blank"),
             diag  = list(continuous = "blankDiag")
     )

gpairs_lower(g)

Question:

How can I save the ggpairs as the current ggsave does not work?

Script:

library(GGally)
library(ggplot2)
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1],200),]
pf<-ggpairs(  diamonds.samp[,1:3],mapping = ggplot2::aes(color = cut))
ggsave("C:/Users/top/Desktop/ggpairs.jpg", pf, dpi=500)

Answer:

If you try to use ggsave you get an error

ggsave("ggpairs.jpg", pf, dpi=500)

Saving 7 x 7 in image Error in UseMethod("grid.draw") : no applicable method for 'grid.draw' applied to an object of class "c('gg', 'ggmatrix')"

So you can write you own grid.draw method for the ggpairs object class

grid.draw.gg <- function(x){
  print(x)
}

ggsave("ggpairs.jpg", pf, dpi=500)

Question:

Can I use pcor (from ppcor) or actually put any correlation matrix I make in advance into the code of ggpairs (in the upper =) instead of cor?

I want to integrate within ggpairs a partial correlation matrix or the pcor.

library(GGally)
a <- as.numeric(1:10)
b <- as.numeric(a*a)
c <- as.numeric(a/b)
D <- as.factor(c("A", "B", "C", "A", "B", "C","A", "B", "C","A"))
abcd <- data.frame(a,b,c, D)

p <- ggpairs(abcd, columns = c("a", "b", "c"), title = "All Bivariate analysis", 
           upper = list(continuous = wrap("cor",   size = 6)),
           lower = list(continuous = wrap("smooth", alpha = 0.6, size = 0.1)),      
           mapping = aes(color = D))

for (i in 1:p$nrow) {
  for (j in 1:p$ncol) {
    p[i,j] <- p[i,j] + 
      scale_fill_manual(values=c("grey25", "slategrey", "grey85")) +
      scale_color_manual(values=c("grey37", "slategrey", "grey75"))  
  }
}

d <- p + theme(axis.text.x = element_text(face = "bold", size = 10 ),
             axis.text.y = element_text(face = "bold", size = 10),
             strip.text = element_text(size = 20))
d

I would like to use the fantastic ggpairs but whit partial correlation matrix. Is it possible? I guess I should do this in this part:

upper = list(continuous = wrap("cor",   size = 6))

Answer:

Looking at the code of GGally::ggpairs you can see that you can provide a function to upper which needs to produce a ggplot. When providing a function stub like this:

 upper = list(continuous = function(data, mapping) { print(list(data, mapping)) })

You will see that for each panel you get the whole data.frame and an aes mapping describing what should be on the x- and y-axis and which other aesthetics you may have set, for instance:

[[1]]
    a   b         c D
1   1   1 1.0000000 A
2   2   4 0.5000000 B
3   3   9 0.3333333 C
4   4  16 0.2500000 A
5   5  25 0.2000000 B
6   6  36 0.1666667 C
7   7  49 0.1428571 A
8   8  64 0.1250000 B
9   9  81 0.1111111 C
10 10 100 0.1000000 A

[[2]]
Aesthetic mapping: 
* `x`      -> `b`
* `y`      -> `a`
* `colour` -> `D`

Out of this information we need to

  1. Calculate the pcor
  2. Extract the relevant coefficients

This is a bit tricky, because we need to calculate a grouped pcor (one coefficient for each level of colour -> D + potentially other groupings which you may want to include later) and we would need to get the grouping structure from the mapping, which is also not that straight forward.

To make a long story short, the following stub shows you the direction and you can take it from there to further fine-tune the appearance of the upper plot:

library(tidyverse)
pcor_panel <- function(data, mapping, ...) {
  ## remove x, y mapping
  grp_aes <- mapping[setdiff(names(mapping), c("x", "y"))]
  ## extract the columns to which x and y is mapped
  xy <- sapply(mapping[c("x", "y")], rlang::as_name)
  ## calculate pcor per group
  stats <- data %>%
    group_by(!!!unname(unclass(grp_aes))) %>%
    group_modify(function(dat, grp) {
      res <- pcor(dat)$estimate %>%
        as_tibble() %>%
        setNames(names(dat)) ## needed b/c in pcor names are sometimes messed up
      res <- res %>%
        mutate(x = names(res)) %>%
        gather(y, pcor, -x)
      res %>%
        filter(x == xy[1], y == xy[2]) ## look only at the pcors of this panel
    }) %>% 
    ungroup() %>%
    mutate(x = 1, y = seq_along(y))
  ggplot(stats, aes(x, y, label = round(pcor, 3))) +
    geom_text(grp_aes) +
    ylim(range(stats$y) + c(-2, 2))
}

ggpairs(abcd, columns = c("a", "b", "c"), title = "All Bivariate analysis", 
        upper = list(continuous = pcor_panel),
        lower = list(continuous = wrap("smooth", alpha = 0.6, size = 0.1)),      
        mapping = aes(color = D))

Question:

I have three variables a, b, c. I want to make a ggpairs plot of a and b with each variable (in all of the panels) colored by c. How can I do this?

Code example
library(ggplot2)
library(GGally)
N <- 100
a <- rnorm(N, 0, 1)
b <- rnorm(N, 0, 1)
point.colors <- runif(N, 0, 1)
ggpairs(data=data.frame(a, b)) # How to add point.colors here? 

I can do this using base R pretty easily:

plot(a, b, col=colorRampPalette(c('red', 'blue'))(N)[1+floor(N*point.colors)])

How to do it with ggpairs?

(edit: off-by-one)


Answer:

Why not change the plot within the ggpairs object?

p = ggpairs(data = data.frame(a,b)) 
p21 = qplot(a,b,colour = point.colors) 
#next line didn't work for user
#p[2,1] = p21
p$plots[[3]] = p21

Question:

I am using the ggpairs from ggplot2.

I need to get an histogram in the diagonal for the ggpairs, but want to superimpose the normal density curve using the mean and sd of the data.

I read the help (https://www.rdocumentation.org/packages/GGally/versions/1.4.0/topics/ggpairs) but can't find an option to do it. I guess I must built my own function (myfunct) and then

ggpairs(sample.dat, diag=list(continuous = myfunct))

Has anyone have tried this?


I have tried the following:

head(data) 
      x1    x2    x3    x4    x5    x6     F1    F2 
1 -0.749 -1.57 0.408 0.961 0.777 0.171 -0.143 0.345 

myhist = function(data){ 
          ggplot(data, aes(x)) + 
             geom_histogram(aes(y = ..density..),colour = "black") + 
             stat_function(fun = dnorm, args = list(mean = mean(x), sd = sd(x))) 
           } 

ggpairs(sample.data, diag=list(continuous = myhist))

The result is:

Error in (function (data) : unused argument (mapping = list(~x1))


Answer:

This question provides an example of the code to add a normal curve to a histogram in ggplot2. You can use this to write your own function to pass to the diag argument of ggpairs. To calculate the mean and sd of the data, you can grab the relevant data using, for example, eval_data_col(data, mapping$x). Example below (perhaps a little more complicated than needed but it allows you to pass parameters to change colours etc using the wrap functionality.

library(GGally)    

diag_fun <- function(data, mapping, hist=list(), ...){

    X = eval_data_col(data, mapping$x)
    mn = mean(X)
    s = sd(X)

    ggplot(data, mapping) + 
      do.call(function(...) geom_histogram(aes(y =..density..), ...), hist) +
      stat_function(fun = dnorm, args = list(mean = mn, sd = s), ...)
  }

ggpairs(iris[1:100, 1:4], 
        diag=list(continuous=wrap(diag_fun, hist=list(fill="red", colour="blue"), 
                                  colour="green", lwd=2)))

Question:

I am using the Auto dataset from the ISLR library and the function ggpairs() from gpairs library to create a scatterplot of all possible combinations of variables. My code is the following:

data(Auto)
setDT(Auto)
ggpairs(Auto[, -c("name"), with = FALSE] , 
        lower = list(continuous = wrap("points", color = "red", alpha = 0.5), 
                     combo = wrap("box", color = "orange", alpha = 0.3), 
                     discrete = wrap("facetbar", color = "yellow", alpha = 0.3) ), 
                    diag = list(continuous = wrap("densityDiag",  color = "blue", alpha = 0.5) ))+
     theme(axis.text.x = element_text(angle = 90, hjust = 1))

The plot is the one below:

There are some issues with this plot:

  1. The axes tick labels are not readable. How could I remove the numbers and possibly rotate the tick lables to be vertical to the axes?

  2. How could I enforce different colors for the combo pairs (categorical - continuous)

Your advice will be appreciated.


Answer:

Maybe the proposed solution is not a perfect match with your wishes, but I hope it helps.

  1. You need to invoke more libraries to get the code to work.
  2. You will need to have factors to "force" the categorical variables to be known as such.

The following code may do the trick:

library(ISLR)
library(data.table)
library(GGally)
library(ggplot2)
data(Auto, package = "ISLR")

# remove unwanted column and make categorical variables
Auto2 <- Auto[, -9]
Auto2$cylinders <- factor(Auto2$cylinders)
Auto2$origin <- factor(Auto2$origin)

ggpairs(Auto2 , 
        lower = list(continuous = wrap("points", color = "red", alpha = 0.5), 
                     combo = wrap("box", color = "orange", alpha = 0.3), 
                     discrete = wrap("facetbar", color = "yellow", alpha = 0.3) ), 
        diag = list(continuous = wrap("densityDiag",  color = "blue", alpha = 0.5) ))

This yields the following picture:

Please let me know whether this is what you want.

Question:

I am using ggpairs and while plotting the matrix, I receive a matrix as follows

As you can see, some of the text length is large and hence the text is not seen completely. Is there anyway that I can wrap the text so that it is visible completely.

Code

ggpairs(df) 

I want the text to wrap so that it can be seen something like this


Answer:

You can use the labeller argument of ggpairs to pass a function to be applied to the facet strip text.

ggplot does have a nice ready function label_wrap_gen() that wrap the long labels.

By default ggpairs use the column names as labels, and those can't contain spaces. label_wrap_gen() need spaces to split the labels on multiple rows.

This is a solution:

library(ggplot2)
library(GGally)
df <- iris

colnames(df) <- make.names(c('Long colname', 
                  'Quite long colname', 
                  'Longer tha usual colname',
                  'I\'m not even sure this should be a colname',
                  'The ever longest colname that one should be allowed to use'))

ggpairs(df, 
        columnLabels = gsub('.', ' ', colnames(df), fixed = T), 
        labeller = label_wrap_gen(10))

Question:

I am getting the below error when trying to plot the dat data frame

library(GGally)
library(ggplot2)
dat = data.frame(a=rnorm(5) , b= rnorm(5) ,c =rnorm(5) , d=rnorm(5) , e= c(1,2,3,4,5))
dat

           a          b          c           d e
1  0.21444531  1.9972134  2.1988103 -0.47624689 1
2 -0.32468591  0.6007088  1.3124130 -0.78860284 2
3  0.09458353 -1.2512714 -0.2651451 -0.59461727 3
4 -0.89536336 -0.6111659  0.5431941  1.65090747 4
5 -1.31080153 -1.1854801 -0.4143399 -0.05402813 5

ggpairs(dat  ,mapping=aes(color =e),upper=list(continuous=wrap("cor",size=2)), columns = c("a","b","c","d"))

Error:

Error in $<-.data.frame(tmp, "label", value = ": ") : replacement has 1 row, data has 0

I would like to color the data points using column "e"

Any ideas?


Answer:

If you factorize e then it runs:

dat$e <- factor(dat$e)
ggpairs(dat,mapping=aes(color=e),upper=list(continuous=wrap("cor",size=2)), columns = c("a","b","c","d"))

But that is a pretty ugly figure not to mention a useless comparison.

If you eliminate the mapping then the code also runs fine:

ggpairs(dat,upper=list(continuous=wrap("cor",size=2)), columns = c("a","b","c","d"))

Question:

It is possible to change the column label of factor levels without having to change the values in the data.frame

for example in the following graph can I change the label of Female and Male to F and M respectively without having to change the df?

library(GGally)
data(tips, package = "reshape")
pm <- ggpairs(tips, 1:3, columnLabels = c("Total Bill", "Tip", "Sex"))
pm


Answer:

After

pm <- ggpairs(tips, 1:3, columnLabels = c("Total Bill", "Tip", "Sex"))

do this

levels(pm$data$sex)[levels(pm$data$sex) == "Male"] = "M"
levels(pm$data$sex)[levels(pm$data$sex) == "Female"] = "F"

You'll get this plot:

It won't change anything in tips dataset:

head(tips)

 total_bill  tip    sex smoker day   time size
1     16.99 1.01 Female     No Sun Dinner    2
2     10.34 1.66   Male     No Sun Dinner    3
3     21.01 3.50   Male     No Sun Dinner    3
4     23.68 3.31   Male     No Sun Dinner    2
5     24.59 3.61 Female     No Sun Dinner    4
6     25.29 4.71   Male     No Sun Dinner    4

Question:

I would like to generate a correlation plot with my "True" variable pairs with all of the rest (People variables). I am pretty sure this has been brought up somewhere but solutions I have found do not work for me.

library(ggplot2)
set.seed(0)

dt = data.frame(matrix(rnorm(120, 100, 5), ncol = 6) )
colnames(dt) = c('Salary', paste0('People', 1:5))
ggplot(dt, aes(x=Salary, y=value)) +
  geom_point() + 
  facet_grid(.~Salary)

Where I got error: Error: Column y must be a 1d atomic vector or a list.

I know one of the solutions is writing out all of the variables in y - which I am trying to avoid because my true data has 15 columns.

Also I am not entirely sure what do the "value", "variables" refer to in the ggplot. I saw them a lot in demonstrating codes.

Any suggestion is appreciated!


Answer:

You want to convert your data from wide to long format using tidyr::gather() for example. Here is a solution using packages in the tidyverse framework

library(tidyr)
library(ggplot2)
theme_set(theme_bw(base_size = 14))

set.seed(0)
dt = data.frame(matrix(rnorm(120, 100, 5), ncol = 6) )
colnames(dt) = c('Salary', paste0('People', 1:5))

### convert data frame from wide to long format
dt_long <- gather(dt, key, value, -Salary)
head(dt_long)
#>      Salary     key     value
#> 1 106.31477 People1  98.87866
#> 2  98.36883 People1 101.88698
#> 3 106.64900 People1 100.66668
#> 4 106.36215 People1 104.02095
#> 5 102.07321 People1  99.71447
#> 6  92.30025 People1 102.51804

### plot
ggplot(dt_long, aes(x = Salary, y = value)) +
  geom_point() +
  facet_grid(. ~ key) 

### if you want to add regression lines
library(ggpmisc)

# define regression formula
formula1 <- y ~ x

ggplot(dt_long, aes(x = Salary, y = value)) +
  geom_point() +
  facet_grid(. ~ key) +
  geom_smooth(method = 'lm', se = TRUE) +
  stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~")), 
               label.x.npc = "left", label.y.npc = "top",
               formula = formula1, parse = TRUE, size = 3) +
  coord_equal()

### if you also want ggpairs() from the GGally package
library(GGally)
ggpairs(dt)

Created on 2019-02-28 by the reprex package (v0.2.1.9000)