Hot questions for Using Ggplot2 in ggfortify

Question:

Using autoplot from ggfortify to create diagnostic plots:

library(ggplot2)
library(ggfortify)

mod <- lm(Petal.Width ~ Petal.Length, data = iris)
autoplot(mod, label.size = 3)

Is it possible to change the axis and plot titles (easily)? I'd like to translate them.


Answer:

The function autoplot.lm returns an S4 object (class ggmultiplot, see ?`ggmultiplot-class`). If you look at the helpfile, you'll see they have replacement methods for individual plots. That means you can extract an individual plot, modify it, and put it back. For example:

library(ggplot2)
library(ggfortify)

mod <- lm(Petal.Width ~ Petal.Length, data = iris)
g <- autoplot(mod, label.size = 3) # store the ggmultiplot object

# new x and y labels
xLabs <- yLabs <- c("a", "b", "c", "d")

# loop over all plots and modify each individually
for (i in 1:4)
    g[i] <- g[i] + xlab(xLabs[i]) + ylab(yLabs[i])

# display the new plot
print(g) 

Here I only modified the axis labels, but you change anything about the plots individually (themes, colors, titles, sizes).

Question:

I am running a principal component analysis with a varimax rotation and wish to display the plot which seems simple enough, however my loading vectors are very close in some places and the labels of which factor they are tend to overlap. That is where ggrepel comes in in order to separate the lables. My dilemma now is figuring out how to connect the two. I used auto plot which automatically adds the desired text and it is making it difficult to define which text to repel. There may be other ways of going about it and I am open to suggestion. I have my code that works but has overlap and one of my attempts to repel the code below.

autoplot(prcomp(built.df9),
loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, 
loadings.label.size = 4, loading.label.color = 'red') +
ggtitle(label = "Principal Component Analysis and Varimax Rotation for Built 
Environment Indicators") +
geom_text_repel(aes(label = rownames(prcomp(built.df9))))

autoplot(prcomp(built.df9),
loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, 
loadings.label.size = 4, loading.label.color = 'red') +
ggtitle(label = "Principal Component Analysis and Varimax Rotation for Built 
Environment Indicators")

Answer:

You can use loadings.label.repel=T from the ggfortify package.

This example uses your same code, just with the mtcars dataset.

Without repelled labels:

library(ggplot2)
library(ggfortify)

autoplot(prcomp(mtcars),
         loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, 
         loadings.label.size = 4, loading.label.color = 'red') +
  ggtitle(label = "Principal Component Analysis and Varimax Rotation for Built 
          Environment Indicators") 

With repelled labels:

autoplot(prcomp(mtcars),
         loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, 
         loadings.label.size = 4, loading.label.color = 'red',loadings.label.repel=T) +
  ggtitle(label = "Principal Component Analysis and Varimax Rotation for Built 
          Environment Indicators") 

Question:

How can I add the red dashed contour lines that show the Cook's distance in this first plot to the second plot using ggplot and ggfortify?

Code used:

library(ggfortify)
model <- glm(mpg ~ wt, data = mtcars, family = gaussian())
plot(model, which = 5) # first plot
autoplot(model, which = 5) # second plot

I think that geom_contour could be added, but I do not know the formula used to calculate the Cook's distance lines.


Answer:

After some research, I managed to plot a contour of level using the formula sqrt(level * length(coef(model)) * (1 - leverage)/leverage), which is what R uses to draw its contours for plot.lm. The method I used can definitely be improved though.

library(ggplot2)
library(ggfortify)
model <- glm(mpg ~ wt, data = mtcars, family = gaussian())

cd_cont_pos <- function(leverage, level, model) {sqrt(level*length(coef(model))*(1-leverage)/leverage)}
cd_cont_neg <- function(leverage, level, model) {-cd_cont_pos(leverage, level, model)}

autoplot(model, which = 5) +
    stat_function(fun = cd_cont_pos, args = list(level = 0.5, model = model), xlim = c(0, 0.25), lty = 2, colour = "red") +
    stat_function(fun = cd_cont_neg, args = list(level = 0.5, model = model), xlim = c(0, 0.25), lty = 2, colour = "red") +
    scale_y_continuous(limits = c(-2, 2.5))

Question:

I would like to plot PC2 against PC3 using the function autoplot() of the package ggfortify. By default just PC1 and PC2 are shown:

library(ggfortify)
myPCA <- prcomp(iris[-5])
autoplot(myPCA)

I can get what I want by reordering and renaming columns in the prcomp object:

myPCAtrunc <- myPCA
myPCAtrunc[[1]] <- myPCAtrunc[[1]][c(2,3,1,4)]
myPCAtrunc[[2]] <- myPCAtrunc[[2]][,c(2,3,1,4)]
colnames(myPCAtrunc[[2]]) <- c("PC1","PC2","PC3","PC4") # fake names
myPCAtrunc[[5]] <- myPCAtrunc[[5]][,c(2,3,1,4)]
colnames(myPCAtrunc[[5]]) <- c("PC1","PC2","PC3","PC4") # fake names
autoplot(myPCAtrunc, xlab = "PC2", ylab="PC3")

I know it is correct, because it is the same as plot(myPCA$x[, c(2,3)]).

But there must be a cleaner way to solve it. Some ideas?


Answer:

This issue was recently solved (here).

autoplot(myPCA,    # your prcomp object
         x = 2,    # PC2
         y = 3)    # PC3

Question:

I would like to be able to adjust the positions of the loading labels, so that they do not fall atop the the arrows. However, I do not know where the adjustments need to be made. The geom_text can be used to adjust the position of the site positions, but I cannot find where the vectors are stored in str(g).

library(ggplot2)
library(ggfortify)
df <- data.frame(replicate(10,sample(-10:10,10,rep=TRUE)))
names(df) <- c('up','down','left','right','circle','square','triangle','x','r1','l1')
rownames(df) <- paste('Dummy Site', seq(0,9,1))
g <- autoplot(prcomp(df[,-11], scale=TRUE), data=df,
              loadings.label=TRUE, loadings=TRUE, 
              loadings.label.size=8, loadings.colour='blue',
              label.size=5) +
     geom_text(vjust=-1, label=rownames(df)) +
     theme(plot.background=element_blank(),
           panel.background=element_rect(fill='transparent',color='black',size=1),
           legend.text=element_text(hjust=1),
           legend.key=element_blank()) 
g

I've looked in ggplot2::theme and I've examined the help docs for autoplot, but can't find any mention of the adjusting label position. Bonus points if it can adjust based on the vector of the arrow, but a static adjustment would be acceptable.

Currently, here is what the plot looks like:


Answer:

You can get the coordinates by layer_data(g, 2). But autoplot(prcomp.obj) passes other arguments to ggbiplot(), so you can change label and loadings.label position using arguments of ggbiplot(), such as loadings.label.hjust (see ?ggbiplot).

example code:
arrow_ends <- layer_data(g, 2)[,c(2,4)]

autoplot(prcomp(df[,-11], scale=TRUE), data=df,
         loadings.label=TRUE, loadings=TRUE, 
         loadings.label.size=8, loadings.colour='blue',
         label.size=5, loadings.label.vjust = 1.2) +     # change loadings.label position
     geom_point(data = arrow_ends, aes(xend, yend), size = 3) +  # the coordinates from layer_data(...)
     geom_text(vjust=-1, label=rownames(df)) +
     theme(plot.background=element_blank(),
           panel.background=element_rect(fill='transparent',color='black',size=1),
           legend.text=element_text(hjust=1),
           legend.key=element_blank()) 

Question:

I create a graph using autoplot function using mtcars data and get graph like this

here my code:

library(cluster)
library(NbClust)
library(ggplot2)
library(ggfortify)
x <- mtcars
number.cluster <- NbClust(x, distance = "euclidean", min.nc = 1, max.nc = 5, method = "complete", index = "ch")
best.cluster <- as.numeric(number.cluster$Best.nc[1])
x.pam <- pam(x, best.cluster)
autoplot(x.pam, data = x, frame = T) + ggtitle("PAM MTCARS")

my question is how do i get PC1 & PC2 data Coordinate based on this graph? thank you


Answer:

You can use layer_data() to get the data used for a ggplot object:

p <- autoplot(x.pam, data = x, frame = T) + ggtitle("PAM MTCARS")
layer_data(p, 1L) # coordinates of all points
layer_data(p, 2L) # coordinates of points that contribute to polygons

Question:

I want to plot graphs for various Forecast models.

When I use autoplot after loading ggplot2, the plot appears like this:

autoplot(m_hw1_ff)

I also want to add the fitted lines for training and test data. For that I am using the below code:

autoplot(m_hw1_ff) + 
  geom_line(aes(y=m_reg1_ff$fitted), col = "green") +
  geom_line(data=test_ts_data, aes(y=test_ts_data), col = "red")

When the above code is run after just loading ggplot2, it gives the following error:

Error in order(data$PANEL, data$group, data$x) : 
  argument 3 is not a vector

After referring to the comments and answers on this question, I loaded the ggfortify package as well.

forecast v7 & ggplot2 graphics adding fitted line to autoplot

The code works fine after that and the fitted lines for training and test data are plotted perfectly. However the shaded region, which was previously blue (dark and light for Lo 80, Hi 80, Lo 95 and Hi 95) has turned grey completely as in the graph below:

I want the shaded to region to appear as it appeared in the first graph.


Answer:

There are several issues with your code.

The first plot is plotted using

forecast:::autoplot.forecast

the autplot method for forecast objects from the package forecast

when you load ggforitify it is masked by:

ggfortify:::autoplot.forecast

and this is why the plots behave differently.

My recommendation is to convert the prediction objects to data frames and plot using ggplot. This will allow a much higher level of customization. Example:

library(forecast)
library(ggfortify)

d.arima <- auto.arima(AirPassengers)
d.forecast <- forecast(d.arima,  h = 50)

create a data frame for plotting:

for_plot <- ggfortify:::fortify.forecast(d.forecast,
                                         ts.connect = TRUE)

you can just do also:

 for_plot <- fortify(d.forecast, 
                     ts.connect = TRUE)

after loading ggfortify.

I just wrote it as above so you would understand what is called.

The for_plot object is a data frame, not in long format which ggplot likes. Nor in a format that is friendly for conversion to long, but it is manageable:

Example without conversion to long format (the ggplot heretic way):

ggplot(data = for_plot) +
  geom_line(aes(x= Index, y = Data, color = "raw")) +
  geom_line(aes(x= Index, y = Fitted, color = "fitted")) +
  geom_line(aes(x= Index, y = `Point Forecast`, color = "point forecast")) +
  geom_ribbon(aes(x= Index, ymin = `Lo 80`, ymax = `Hi 80`,  fill = "80"),  alpha = 0.2) +
  geom_ribbon(aes(x= Index, ymin = `Lo 95`, ymax = `Hi 95`,  fill = "95"),  alpha = 0.2) +
  scale_fill_manual("what", values = c("blue", "dodgerblue"))+
  scale_color_manual("why", values = c("blue", "red", "green"))

The ggplot way would include spiting the data to two data frames, one for plotting the ribbon and the other for plotting the lines, converting each to long format and then plotting. Something like this:

library(tidyverse)

for_plot_lines <- for_plot %>%
  gather(key, value, 2:4) %>%
  select(key, value, Index)

for_plot %>%
  filter(!is.na(`Point Forecast`)) %>%
  gather(Lo, ymin, c("Lo 80", "Lo 95")) %>%
  gather(Hi, ymax, c("Hi 80", "Hi 95")) -> for_plot_ribbon

ggplot(data = for_plot_lines) +
  geom_line(aes(x= Index, y = value, color = key)) +
  geom_ribbon(data = for_plot_ribbon,
              aes(x= Index, ymin = ymin, ymax = ymax, fill = Hi), alpha = 0.2)

Question:

library(ggfortify)

With ggfortify, if I plot one time series, I can set the line colour as follows:

autoplot(myts1,ts.colour='blue')

I can plot two ts objects in one graph:

autoplot(cbind(myts1,myts2),facets=FALSE)

But how can I set for example the line colour for the first ts 'blue' and for the second 'red'? In the second example, ts.colour doesn't work at all.

edit: here is a working example

myts1 = filter(rnorm(100), filter=rep(1,20),circular=TRUE)
myts2 = sin(seq(0,20,length.out=100))*5+5
autoplot(cbind(myts1,myts2),facets=FALSE)

Answer:

You can use scale_colour_manual

When facet is disabled, autoplot colorize each series with "variable". Thus simply add scale_colour_manual.

pallete = c('red', 'blue', 'green', 'orange')
autoplot(Canada, facets = FALSE, size = 3) + scale_colour_manual(values=pallete)

Otherwise, you must specify colour = "variable" explicitly to colorize each series.

autoplot(Canada, size = 3, ts.colour = 'variable') + scale_colour_manual(values=pallete)

Question:

I'm trying to change the facet labels for an stl decomposition plot like the following:

library(ggplot2)
library(ggfortify)
p <- autoplot(stl(AirPassengers, s.window = 'periodic'), ts.colour = "black", ts.size = 0.2)
p

The plot originates from the ggfortify package. I wish to change the facet labels to:

c("Original Data", "Seasonal component", "Trend component", "Remainder")

I've tried to get into the structure of a ggplot (a lot of str'ing), and found that the following stores these names:

str(p$layers[[1]]$data$variable)
# Factor w/ 4 levels "Data","seasonal",..: 1 1 1

However, when I change this factor in-place. I get four empty plots followed by the proper plots:

p$layers[[1]]$data$variable <- factor(p$layers[[1]]$data$variable,
                                      labels=c("Original series", "Seasonal Component", "Trend component", "Remainder"))    

How do I change the facet labels without getting these empty plots at the top?


Answer:

One possibility is to change the relevant components of the plot object.

# generate plot data which can be rendered
g <- ggplot_build(p)

# inspect the object and find the relevant element to be changed
# str(g)

# perform desired changes
g$panel$layout$variable <- c("Original Data", "Seasonal component", "Trend component", "Remainder")

# build a grob and 'draw' it
grid.draw(ggplot_gtable(g))

Question:

I am using K-mean alg. in R in order to separe variables. I would like to plot results in ggplot witch I was able to manage, however results seem to be different in ggplot and in cluster::clusplot

So I wanted to ask what I am missing: for example I know that scaling in different but I was wondering Whz when using clustplot all variables are inside the bounds and when using ggplot it is not.

Is it just because of the scaling?

So are two below result exatly the same?

library(cluster)
library(ggfortify)


x <- rbind(matrix(rnorm(2000, sd = 123), ncol = 2),
           matrix(rnorm(2000, mean = 800, sd = 123), ncol = 2))
colnames(x) <- c("x", "y")
x <- data.frame(x)

A <- kmeans(x, centers = 3, nstart = 50, iter.max = 500)
cluster::clusplot(cbind(x$x, x$y), A$cluster, color = T, shade = T)
autoplot(kmeans(x, centers = 3, nstart = 50, iter.max = 500), data = x, frame.type = 'norm')

Answer:

For me, I get the same plot using either clusplot or ggplot. But for using ggplot, you have to first make a PCA on your data in order to get the same plot as clustplot. Maybe it's where you have an issue.

Here, with your example, I did:

x <- rbind(matrix(rnorm(2000, sd = 123), ncol = 2),
           matrix(rnorm(2000, mean = 800, sd = 123), ncol = 2))
colnames(x) <- c("x", "y")
x <- data.frame(x)

A <- kmeans(x, centers = 3, nstart = 50, iter.max = 500)
cluster::clusplot(cbind(x$x, x$y), A$cluster, color = T, shade = T)

pca_x = princomp(x)
x_cluster = data.frame(pca_x$scores,A$cluster)
ggplot(test, aes(x = Comp.1, y = Comp.2, color = as.factor(A.cluster), fill = as.factor(A.cluster))) + geom_point() + 
  stat_ellipse(type = "t",geom = "polygon",alpha = 0.4)

The plot using clusplot

And the one using ggplot:

Hope it helps you to figure out the reason of your different plots

Question:

I have been trying to use autoplot (in the ggfortify R package) to plot data points in PCA coordinates. For data matrix D2,

autoplot(prcomp(D2),colour=color_codes)

works fine as far a generating a scatterplot of points in the space of principal components 1+2. However, PCA components 1+2 only explain about 30% of the covariance, and I would like to do the same for PCA 1+3, 2+3, and 3+4, etc. Is there a simple argument in autoplot that will let me do this, and if not, what's the simplest function I can use to do so?

Additionally, is there some way to calculate and add centroids using autoplot?


Answer:

From ?autoplot.prcomp: autoplot(object, data = NULL, scale = 1, x = 1, y = 2, ...) where: x = principal component number used in x axis and y = principal component number used in y axis

Hence, if you need to plot PC2 vs PC3 and to add the centroid:

library(ggfortify)
set.seed(1)
D2 <- matrix(rnorm(1000),ncol=10)

prcmp <- prcomp(D2)
pc.x <- 2
pc.y <- 3
cnt.x <- mean(prcmp$x[,pc.x])
cnt.y <- mean(prcmp$x[,pc.y])
autoplot(prcmp, x=2, y=3) +
  geom_point(x=cnt.x, y=cnt.y, colour="red", size=5)

Question:

I'm trying to reproduce the following stats::biplot plot with ggplot2::autoplot from ggfortify R package.

biplot(prcomp(USArrests, scale = TRUE))

Here is my ggplot2::autoplot code from ggfortify R package with its output.

devtools::install_github("sinhrks/ggfortify")
 library(ggfortify)
ggplot2::autoplot(stats::prcomp(USArrests, scale=TRUE), label = TRUE, loadings.label = TRUE)

Questions

  1. Why the two plots are different? How to reproduce the base plot?
  2. How to add labels as shown in base plot?

Answer:

Thank you for using the package. The issue is depending on {dplyr} version, and being fixed in {ggfortify}. Could you update the package and then try?

I've attached the result after the fix to below link:

https://github.com/sinhrks/ggfortify/pull/21

Question:

I am using the autoplot function to make a PCA biplot. In my case, I would like to increase the point stroke to improve the readability of the plot. How can I do that?

Here's an example:

library(ggfortify)
df <- iris[c(1, 2, 3, 4)]
autoplot(prcomp(df), data = iris, colour="Species", fill="Species", shape="Species", geom="points", size=2) +
  scale_color_manual(values=c("#1B9E77","#D95F02","#7570B3")) +
  scale_fill_manual(values=c("#ffffff","#ffffff","#ffffff")) +
  scale_shape_manual(values=c(21:23))


Answer:

I found the solution to my problem by adding the last line of code to the plot:

library(ggfortify)
df <- iris[c(1, 2, 3, 4)]
p <- autoplot(prcomp(df), data = iris, colour="Species", fill="Species", shape="Species", geom="points", size=2) +
  scale_color_manual(values=c("#1B9E77","#D95F02","#7570B3")) +
  scale_fill_manual(values=c("#ffffff","#ffffff","#ffffff")) +
  scale_shape_manual(values=c(21:23))
p$layers[[1]]$aes_params$stroke <- 2
p

Question:

I'm plotting diagnostics plots for a regression model using autoplot. I would like to add a general single title for the graph.

As example:

library(ggfortify)
autoplot(lm(Petal.Width ~ Petal.Length, data = iris), label.size = 3)

I would like to place a "Title" at the top without modifying any subplot. Thanks in advance.

EDIT: I already tried grid.arrange() getting this error: Error in $<-(tmp, wrapvp, value = vp) : no method for assigning subsets of this S4 class.


Answer:

You can directly reference the list of ggplot objects within the ggmultiplot object returned by ggfortify's autoplot.lm:

p <- autoplot(lm(Petal.Width ~ Petal.Length, data = iris), label.size = 3)

gridExtra::grid.arrange(grobs = p@plots, top = "some title")

Question:

I'm trying to create a facet plot from timeseries data ...

if(!require('fma')){
    install.packages("fma")
    library(fma)
}
if(!require('ggfortify')){
    install.packages("ggfortify")
    library(ggfortify)
}
ec <- ts(econsumption, frequency = 12)
ec

Which results in ...

       Mwh temp
Jan 1 16.3 29.3
Feb 1 16.8 21.7
Mar 1 15.5 23.7
Apr 1 18.2 10.4
May 1 15.2 29.7
Jun 1 17.5 11.9
Jul 1 19.8  9.0
Aug 1 19.0 23.4
Sep 1 17.5 17.8
Oct 1 16.0 30.0
Nov 1 19.6  8.6
Dec 1 18.0 11.8

However, when I try to plot, the x-axis isn't as expected ...

autoplot(ec, facet=T)

The output ...

I was expecting autoplot to automatically set 12 months on the x axis. What am I doing wrong?

Note ...

str(ec)

Results in ...

 Time-Series [1:12, 1:2] from 1 to 1.92: 16.3 16.8 15.5 18.2 15.2 17.5 19.8 19 17.5 16 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "Mwh" "temp"

Answer:

Apparently that's how autoplot deals with months in ts objects. Using zoo and adding some formatting does the job:

autoplot(as.zoo(ec), facet = TRUE) + scale_x_date(date_labels = '%b')

Question:

I'm having troubles using

scale_colour_manual 

function of ggplot. I tried

guide = "legend" 

to force legend appears, but it doesn't work. Rep code:

library(ggfortify)
library(ggplot2)
p  <- ggdistribution(pgamma, seq(0, 100, 0.1), shape = 0.92, scale = 22, 
                     colour = 'red')
p2 <- ggdistribution(pgamma, seq(0, 100, 0.1), shape = 0.9, scale = 5, 
                     colour = 'blue', p=p)

p2 + 
theme_bw(base_size = 14) +
theme(legend.position ="top") +
xlab("Precipitación") +
ylab("F(x)") +
scale_colour_manual("Legend title", guide = "legend", 
                      values = c("red", "blue"), labels = c("Observado","Reforecast")) +
ggtitle("Ajuste Gamma")


Answer:

A solution with stat_function:

library(ggplot2)
library(scales)

cols <- c("LINE1"="red","LINE2"="blue")
df <- data.frame(x=seq(0, 100, 0.1))
ggplot(data=df, aes(x=x)) + 
stat_function(aes(colour = "LINE1"), fun=pgamma, args=list(shape = 0.92, scale = 22)) +
stat_function(aes(colour = "LINE2"), fun=pgamma, args=list(shape = 0.9, scale = 5)) +
theme_bw(base_size = 14) +
theme(legend.position ="top") +
xlab("Precipitación") +
ylab("F(x)") +
scale_colour_manual("Legend title", values=c(LINE1="red",LINE2="blue"),
                    labels = c("Observado","Reforecast")) +
scale_y_continuous(labels=percent) +
ggtitle("Ajuste Gamma")

Question:

I am using RStudio and I have a time series data (ts object) called data1.

Here is how data1 looks:

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2014 135 172 179 189 212  47 301 183 247 292 280 325
2015 471 243 386 235 388 257 344 526 363 261 189 173
2016 272 267 197 217 393 299 343 341 315 305 384 497

To plot the above, I have run this code:

plot (data1)

and I get the following plot:

I want to have a plot that is broken by Year and I was thinking of implementing the facet_grid feature found in ggplot2 but since my data is a ts object, I can't use ggplot2 directly on it.

After some research, I've found that the ggfortify library works with ts objects. However, I am having a hard time trying to figure out to use the facet_grid feature with it.

My aim to is to plot something like below from my ts data:

'Female'and 'Male' will be replaced by the Years 2014, 2015 and 2016. The X-axis will be the Months (Jan, Feb, Mar, and so on) and the y-axis will be the values in the ts file . I would prefer a line plot rather than a dot plot.

Am I on the right track here or is there another way of approaching this problem?


Answer:

We can use ggplot2::autoplot. I will use AirPassengers data as an example.

library(ggplot2)
library(lubridate)
autoplot(AirPassengers) + 
 facet_grid(. ~ year(Index), scales = "free_x") + 
 scale_x_date(date_labels = "%b")