## Hot questions for Using Ggplot2 in ggfortify

Question:

Using `autoplot`

from `ggfortify`

to create diagnostic plots:

library(ggplot2) library(ggfortify) mod <- lm(Petal.Width ~ Petal.Length, data = iris) autoplot(mod, label.size = 3)

Is it possible to change the axis and plot titles (easily)? I'd like to translate them.

Answer:

The function `autoplot.lm`

returns an S4 object (class ggmultiplot, see `?`ggmultiplot-class``

). If you look at the helpfile, you'll see they have replacement methods for individual plots. That means you can extract an individual plot, modify it, and put it back. For example:

library(ggplot2) library(ggfortify) mod <- lm(Petal.Width ~ Petal.Length, data = iris) g <- autoplot(mod, label.size = 3) # store the ggmultiplot object # new x and y labels xLabs <- yLabs <- c("a", "b", "c", "d") # loop over all plots and modify each individually for (i in 1:4) g[i] <- g[i] + xlab(xLabs[i]) + ylab(yLabs[i]) # display the new plot print(g)

Here I only modified the axis labels, but you change anything about the plots individually (themes, colors, titles, sizes).

Question:

I am running a principal component analysis with a varimax rotation and wish to display the plot which seems simple enough, however my loading vectors are very close in some places and the labels of which factor they are tend to overlap. That is where ggrepel comes in in order to separate the lables. My dilemma now is figuring out how to connect the two. I used auto plot which automatically adds the desired text and it is making it difficult to define which text to repel. There may be other ways of going about it and I am open to suggestion. I have my code that works but has overlap and one of my attempts to repel the code below.

autoplot(prcomp(built.df9), loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, loadings.label.size = 4, loading.label.color = 'red') + ggtitle(label = "Principal Component Analysis and Varimax Rotation for Built Environment Indicators") + geom_text_repel(aes(label = rownames(prcomp(built.df9))))

autoplot(prcomp(built.df9), loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, loadings.label.size = 4, loading.label.color = 'red') + ggtitle(label = "Principal Component Analysis and Varimax Rotation for Built Environment Indicators")

Answer:

You can use `loadings.label.repel=T`

from the `ggfortify`

package.

This example uses your same code, just with the `mtcars`

dataset.

**Without repelled labels:**

library(ggplot2) library(ggfortify) autoplot(prcomp(mtcars), loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, loadings.label.size = 4, loading.label.color = 'red') + ggtitle(label = "Principal Component Analysis and Varimax Rotation for Built Environment Indicators")

**With repelled labels:**

autoplot(prcomp(mtcars), loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, loadings.label.size = 4, loading.label.color = 'red',loadings.label.repel=T) + ggtitle(label = "Principal Component Analysis and Varimax Rotation for Built Environment Indicators")

Question:

How can I add the red dashed contour lines that show the Cook's distance in this first plot
to the second plot using `ggplot`

and `ggfortify`

?

Code used:

library(ggfortify) model <- glm(mpg ~ wt, data = mtcars, family = gaussian()) plot(model, which = 5) # first plot autoplot(model, which = 5) # second plot

I think that `geom_contour`

could be added, but I do not know the formula used to calculate the Cook's distance lines.

Answer:

After some research, I managed to plot a contour of `level`

using the formula `sqrt(level * length(coef(model)) * (1 - leverage)/leverage)`

, which is what R uses to draw its contours for `plot.lm`

. The method I used can definitely be improved though.

library(ggplot2) library(ggfortify) model <- glm(mpg ~ wt, data = mtcars, family = gaussian()) cd_cont_pos <- function(leverage, level, model) {sqrt(level*length(coef(model))*(1-leverage)/leverage)} cd_cont_neg <- function(leverage, level, model) {-cd_cont_pos(leverage, level, model)} autoplot(model, which = 5) + stat_function(fun = cd_cont_pos, args = list(level = 0.5, model = model), xlim = c(0, 0.25), lty = 2, colour = "red") + stat_function(fun = cd_cont_neg, args = list(level = 0.5, model = model), xlim = c(0, 0.25), lty = 2, colour = "red") + scale_y_continuous(limits = c(-2, 2.5))

Question:

I would like to plot PC2 against PC3 using the function `autoplot()`

of the package `ggfortify`

. By default just PC1 and PC2 are shown:

library(ggfortify) myPCA <- prcomp(iris[-5]) autoplot(myPCA)

I can get what I want by reordering and renaming columns in the prcomp object:

myPCAtrunc <- myPCA myPCAtrunc[[1]] <- myPCAtrunc[[1]][c(2,3,1,4)] myPCAtrunc[[2]] <- myPCAtrunc[[2]][,c(2,3,1,4)] colnames(myPCAtrunc[[2]]) <- c("PC1","PC2","PC3","PC4") # fake names myPCAtrunc[[5]] <- myPCAtrunc[[5]][,c(2,3,1,4)] colnames(myPCAtrunc[[5]]) <- c("PC1","PC2","PC3","PC4") # fake names autoplot(myPCAtrunc, xlab = "PC2", ylab="PC3")

I know it is correct, because it is the same as `plot(myPCA$x[, c(2,3)])`

.

But there must be a cleaner way to solve it. Some ideas?

Answer:

This issue was recently solved (here).

autoplot(myPCA, # your prcomp object x = 2, # PC2 y = 3) # PC3

Question:

I would like to be able to adjust the positions of the loading labels, so that they do not fall atop the the arrows. However, I do not know where the adjustments need to be made. The `geom_text`

can be used to adjust the position of the site positions, but I cannot find where the vectors are stored in `str(g)`

.

library(ggplot2) library(ggfortify) df <- data.frame(replicate(10,sample(-10:10,10,rep=TRUE))) names(df) <- c('up','down','left','right','circle','square','triangle','x','r1','l1') rownames(df) <- paste('Dummy Site', seq(0,9,1)) g <- autoplot(prcomp(df[,-11], scale=TRUE), data=df, loadings.label=TRUE, loadings=TRUE, loadings.label.size=8, loadings.colour='blue', label.size=5) + geom_text(vjust=-1, label=rownames(df)) + theme(plot.background=element_blank(), panel.background=element_rect(fill='transparent',color='black',size=1), legend.text=element_text(hjust=1), legend.key=element_blank()) g

I've looked in `ggplot2::theme`

and I've examined the help docs for `autoplot`

, but can't find any mention of the adjusting label position. Bonus points if it can adjust based on the vector of the arrow, but a static adjustment would be acceptable.

Currently, here is what the plot looks like:

Answer:

You can get the coordinates by `layer_data(g, 2)`

. But `autoplot(prcomp.obj)`

passes other arguments to `ggbiplot()`

, so you can change `label`

and `loadings.label`

position using arguments of `ggbiplot()`

, such as `loadings.label.hjust`

(see `?ggbiplot`

).

arrow_ends <- layer_data(g, 2)[,c(2,4)] autoplot(prcomp(df[,-11], scale=TRUE), data=df, loadings.label=TRUE, loadings=TRUE, loadings.label.size=8, loadings.colour='blue', label.size=5, loadings.label.vjust = 1.2) + # change loadings.label position geom_point(data = arrow_ends, aes(xend, yend), size = 3) + # the coordinates from layer_data(...) geom_text(vjust=-1, label=rownames(df)) + theme(plot.background=element_blank(), panel.background=element_rect(fill='transparent',color='black',size=1), legend.text=element_text(hjust=1), legend.key=element_blank())

Question:

I create a graph using *autoplot* function using mtcars data and get graph like this

here my code:

library(cluster) library(NbClust) library(ggplot2) library(ggfortify) x <- mtcars number.cluster <- NbClust(x, distance = "euclidean", min.nc = 1, max.nc = 5, method = "complete", index = "ch") best.cluster <- as.numeric(number.cluster$Best.nc[1]) x.pam <- pam(x, best.cluster) autoplot(x.pam, data = x, frame = T) + ggtitle("PAM MTCARS")

my question is how do i get PC1 & PC2 data Coordinate based on this graph? thank you

Answer:

You can use `layer_data()`

to get the data used for a ggplot object:

p <- autoplot(x.pam, data = x, frame = T) + ggtitle("PAM MTCARS") layer_data(p, 1L) # coordinates of all points layer_data(p, 2L) # coordinates of points that contribute to polygons

Question:

I want to plot graphs for various Forecast models.

When I use autoplot after loading ggplot2, the plot appears like this:

autoplot(m_hw1_ff)

I also want to add the fitted lines for training and test data. For that I am using the below code:

autoplot(m_hw1_ff) + geom_line(aes(y=m_reg1_ff$fitted), col = "green") + geom_line(data=test_ts_data, aes(y=test_ts_data), col = "red")

When the above code is run after just loading ggplot2, it gives the following error:

Error in order(data$PANEL, data$group, data$x) : argument 3 is not a vector

After referring to the comments and answers on this question, I loaded the ggfortify package as well.

forecast v7 & ggplot2 graphics adding fitted line to autoplot

The code works fine after that and the fitted lines for training and test data are plotted perfectly. However the shaded region, which was previously blue (dark and light for Lo 80, Hi 80, Lo 95 and Hi 95) has turned grey completely as in the graph below:

I want the shaded to region to appear as it appeared in the first graph.

Answer:

There are several issues with your code.

The first plot is plotted using

forecast:::autoplot.forecast

the `autplot`

method for `forecast`

objects from the package `forecast`

when you load `ggforitify`

it is masked by:

ggfortify:::autoplot.forecast

and this is why the plots behave differently.

My recommendation is to convert the prediction objects to data frames and plot using ggplot. This will allow a much higher level of customization. Example:

library(forecast) library(ggfortify) d.arima <- auto.arima(AirPassengers) d.forecast <- forecast(d.arima, h = 50)

create a data frame for plotting:

for_plot <- ggfortify:::fortify.forecast(d.forecast, ts.connect = TRUE)

you can just do also:

for_plot <- fortify(d.forecast, ts.connect = TRUE)

after loading `ggfortify`

.

I just wrote it as above so you would understand what is called.

The `for_plot`

object is a data frame, not in long format which ggplot likes. Nor in a format that is friendly for conversion to long, but it is manageable:

Example without conversion to long format (the ggplot heretic way):

ggplot(data = for_plot) + geom_line(aes(x= Index, y = Data, color = "raw")) + geom_line(aes(x= Index, y = Fitted, color = "fitted")) + geom_line(aes(x= Index, y = `Point Forecast`, color = "point forecast")) + geom_ribbon(aes(x= Index, ymin = `Lo 80`, ymax = `Hi 80`, fill = "80"), alpha = 0.2) + geom_ribbon(aes(x= Index, ymin = `Lo 95`, ymax = `Hi 95`, fill = "95"), alpha = 0.2) + scale_fill_manual("what", values = c("blue", "dodgerblue"))+ scale_color_manual("why", values = c("blue", "red", "green"))

The ggplot way would include spiting the data to two data frames, one for plotting the ribbon and the other for plotting the lines, converting each to long format and then plotting. Something like this:

library(tidyverse) for_plot_lines <- for_plot %>% gather(key, value, 2:4) %>% select(key, value, Index) for_plot %>% filter(!is.na(`Point Forecast`)) %>% gather(Lo, ymin, c("Lo 80", "Lo 95")) %>% gather(Hi, ymax, c("Hi 80", "Hi 95")) -> for_plot_ribbon ggplot(data = for_plot_lines) + geom_line(aes(x= Index, y = value, color = key)) + geom_ribbon(data = for_plot_ribbon, aes(x= Index, ymin = ymin, ymax = ymax, fill = Hi), alpha = 0.2)

Question:

library(ggfortify)

With ggfortify, if I plot one time series, I can set the line colour as follows:

autoplot(myts1,ts.colour='blue')

I can plot two ts objects in one graph:

autoplot(cbind(myts1,myts2),facets=FALSE)

But how can I set for example the line colour for the first ts 'blue' and for the second 'red'? In the second example, ts.colour doesn't work at all.

edit: here is a working example

myts1 = filter(rnorm(100), filter=rep(1,20),circular=TRUE) myts2 = sin(seq(0,20,length.out=100))*5+5 autoplot(cbind(myts1,myts2),facets=FALSE)

Answer:

You can use `scale_colour_manual`

When facet is disabled, `autoplot`

colorize each series with "variable". Thus simply add `scale_colour_manual`

.

pallete = c('red', 'blue', 'green', 'orange') autoplot(Canada, facets = FALSE, size = 3) + scale_colour_manual(values=pallete)

Otherwise, you must specify `colour = "variable"`

explicitly to colorize each series.

autoplot(Canada, size = 3, ts.colour = 'variable') + scale_colour_manual(values=pallete)

Question:

I'm trying to change the facet labels for an `stl`

decomposition plot like the following:

library(ggplot2) library(ggfortify) p <- autoplot(stl(AirPassengers, s.window = 'periodic'), ts.colour = "black", ts.size = 0.2) p

The plot originates from the ggfortify package. I wish to change the facet labels to:

c("Original Data", "Seasonal component", "Trend component", "Remainder")

I've tried to get into the structure of a `ggplot`

(a lot of `str`

'ing), and found that the following stores these names:

str(p$layers[[1]]$data$variable) # Factor w/ 4 levels "Data","seasonal",..: 1 1 1

However, when I change this factor in-place. I get four empty plots followed by the proper plots:

p$layers[[1]]$data$variable <- factor(p$layers[[1]]$data$variable, labels=c("Original series", "Seasonal Component", "Trend component", "Remainder"))

How do I change the facet labels without getting these empty plots at the top?

Answer:

One possibility is to change the relevant components of the plot object.

# generate plot data which can be rendered g <- ggplot_build(p) # inspect the object and find the relevant element to be changed # str(g) # perform desired changes g$panel$layout$variable <- c("Original Data", "Seasonal component", "Trend component", "Remainder") # build a grob and 'draw' it grid.draw(ggplot_gtable(g))

Question:

I am using K-mean alg. in `R`

in order to separe variables. I would like to plot results in `ggplot`

witch I was able to manage,
however results seem to be different in `ggplot`

and in `cluster::clusplot`

So I wanted to ask what I am missing: for example I know that scaling in different but I was wondering Whz when using `clustplot`

all variables are inside the bounds and when using `ggplot`

it is not.

Is it just because of the scaling?

So are two below result exatly the same?

library(cluster) library(ggfortify) x <- rbind(matrix(rnorm(2000, sd = 123), ncol = 2), matrix(rnorm(2000, mean = 800, sd = 123), ncol = 2)) colnames(x) <- c("x", "y") x <- data.frame(x) A <- kmeans(x, centers = 3, nstart = 50, iter.max = 500) cluster::clusplot(cbind(x$x, x$y), A$cluster, color = T, shade = T) autoplot(kmeans(x, centers = 3, nstart = 50, iter.max = 500), data = x, frame.type = 'norm')

Answer:

For me, I get the same plot using either `clusplot`

or `ggplot`

. But for using `ggplot`

, you have to first make a `PCA`

on your data in order to get the same plot as `clustplot`

. Maybe it's where you have an issue.

Here, with your example, I did:

x <- rbind(matrix(rnorm(2000, sd = 123), ncol = 2), matrix(rnorm(2000, mean = 800, sd = 123), ncol = 2)) colnames(x) <- c("x", "y") x <- data.frame(x) A <- kmeans(x, centers = 3, nstart = 50, iter.max = 500) cluster::clusplot(cbind(x$x, x$y), A$cluster, color = T, shade = T) pca_x = princomp(x) x_cluster = data.frame(pca_x$scores,A$cluster) ggplot(test, aes(x = Comp.1, y = Comp.2, color = as.factor(A.cluster), fill = as.factor(A.cluster))) + geom_point() + stat_ellipse(type = "t",geom = "polygon",alpha = 0.4)

The plot using clusplot

And the one using ggplot:

Hope it helps you to figure out the reason of your different plots

Question:

I have been trying to use autoplot (in the ggfortify R package) to plot data points in PCA coordinates. For data matrix D2,

autoplot(prcomp(D2),colour=color_codes)

works fine as far a generating a scatterplot of points in the space of principal components 1+2. However, PCA components 1+2 only explain about 30% of the covariance, and I would like to do the same for PCA 1+3, 2+3, and 3+4, etc. Is there a simple argument in autoplot that will let me do this, and if not, what's the simplest function I can use to do so?

Additionally, is there some way to calculate and add centroids using autoplot?

Answer:

From `?autoplot.prcomp`

:
`autoplot(object, data = NULL, scale = 1, x = 1, y = 2, ...)`

where:
`x = principal component number used in x axis`

and
`y = principal component number used in y axis`

Hence, if you need to plot PC2 vs PC3 and to add the centroid:

library(ggfortify) set.seed(1) D2 <- matrix(rnorm(1000),ncol=10) prcmp <- prcomp(D2) pc.x <- 2 pc.y <- 3 cnt.x <- mean(prcmp$x[,pc.x]) cnt.y <- mean(prcmp$x[,pc.y]) autoplot(prcmp, x=2, y=3) + geom_point(x=cnt.x, y=cnt.y, colour="red", size=5)

Question:

I'm trying to reproduce the following `stats::biplot`

plot with `ggplot2::autoplot`

from `ggfortify`

`R`

package.

biplot(prcomp(USArrests, scale = TRUE))

Here is my `ggplot2::autoplot`

code from `ggfortify`

`R`

package with its output.

devtools::install_github("sinhrks/ggfortify") library(ggfortify) ggplot2::autoplot(stats::prcomp(USArrests, scale=TRUE), label = TRUE, loadings.label = TRUE)

**Questions**

- Why the two plots are different? How to reproduce the base plot?
- How to add labels as shown in base plot?

Answer:

Thank you for using the package. The issue is depending on {dplyr} version, and being fixed in {ggfortify}. Could you update the package and then try?

I've attached the result after the fix to below link:

Question:

I am using the `autoplot`

function to make a PCA biplot. In my case, I would like to increase the point stroke to improve the readability of the plot. How can I do that?

Here's an example:

library(ggfortify) df <- iris[c(1, 2, 3, 4)] autoplot(prcomp(df), data = iris, colour="Species", fill="Species", shape="Species", geom="points", size=2) + scale_color_manual(values=c("#1B9E77","#D95F02","#7570B3")) + scale_fill_manual(values=c("#ffffff","#ffffff","#ffffff")) + scale_shape_manual(values=c(21:23))

Answer:

I found the solution to my problem by adding the last line of code to the plot:

library(ggfortify) df <- iris[c(1, 2, 3, 4)] p <- autoplot(prcomp(df), data = iris, colour="Species", fill="Species", shape="Species", geom="points", size=2) + scale_color_manual(values=c("#1B9E77","#D95F02","#7570B3")) + scale_fill_manual(values=c("#ffffff","#ffffff","#ffffff")) + scale_shape_manual(values=c(21:23)) p$layers[[1]]$aes_params$stroke <- 2 p

Question:

I'm plotting diagnostics plots for a regression model using autoplot. I would like to add a general single title for the graph.

As example:

library(ggfortify) autoplot(lm(Petal.Width ~ Petal.Length, data = iris), label.size = 3)

I would like to place a "Title" at the top without modifying any subplot. Thanks in advance.

EDIT: I already tried grid.arrange() getting this error: Error in $<-(*tmp*, wrapvp, value = vp) : no method for assigning subsets of this S4 class.

Answer:

You can directly reference the list of ggplot objects within the `ggmultiplot`

object returned by `ggfortify`

's `autoplot.lm`

:

p <- autoplot(lm(Petal.Width ~ Petal.Length, data = iris), label.size = 3) gridExtra::grid.arrange(grobs = p@plots, top = "some title")

Question:

I'm trying to create a facet plot from timeseries data ...

if(!require('fma')){ install.packages("fma") library(fma) } if(!require('ggfortify')){ install.packages("ggfortify") library(ggfortify) } ec <- ts(econsumption, frequency = 12) ec

Which results in ...

Mwh temp Jan 1 16.3 29.3 Feb 1 16.8 21.7 Mar 1 15.5 23.7 Apr 1 18.2 10.4 May 1 15.2 29.7 Jun 1 17.5 11.9 Jul 1 19.8 9.0 Aug 1 19.0 23.4 Sep 1 17.5 17.8 Oct 1 16.0 30.0 Nov 1 19.6 8.6 Dec 1 18.0 11.8

However, when I try to plot, the x-axis isn't as expected ...

autoplot(ec, facet=T)

The output ...

I was expecting autoplot to automatically set 12 months on the x axis. What am I doing wrong?

Note ...

str(ec)

Results in ...

Time-Series [1:12, 1:2] from 1 to 1.92: 16.3 16.8 15.5 18.2 15.2 17.5 19.8 19 17.5 16 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:2] "Mwh" "temp"

Answer:

Apparently that's how `autoplot`

deals with months in `ts`

objects. Using `zoo`

and adding some formatting does the job:

autoplot(as.zoo(ec), facet = TRUE) + scale_x_date(date_labels = '%b')

Question:

I'm having troubles using

scale_colour_manual

function of ggplot. I tried

guide = "legend"

to force legend appears, but it doesn't work. Rep code:

library(ggfortify) library(ggplot2) p <- ggdistribution(pgamma, seq(0, 100, 0.1), shape = 0.92, scale = 22, colour = 'red') p2 <- ggdistribution(pgamma, seq(0, 100, 0.1), shape = 0.9, scale = 5, colour = 'blue', p=p) p2 + theme_bw(base_size = 14) + theme(legend.position ="top") + xlab("Precipitación") + ylab("F(x)") + scale_colour_manual("Legend title", guide = "legend", values = c("red", "blue"), labels = c("Observado","Reforecast")) + ggtitle("Ajuste Gamma")

Answer:

A solution with `stat_function`

:

library(ggplot2) library(scales) cols <- c("LINE1"="red","LINE2"="blue") df <- data.frame(x=seq(0, 100, 0.1)) ggplot(data=df, aes(x=x)) + stat_function(aes(colour = "LINE1"), fun=pgamma, args=list(shape = 0.92, scale = 22)) + stat_function(aes(colour = "LINE2"), fun=pgamma, args=list(shape = 0.9, scale = 5)) + theme_bw(base_size = 14) + theme(legend.position ="top") + xlab("Precipitación") + ylab("F(x)") + scale_colour_manual("Legend title", values=c(LINE1="red",LINE2="blue"), labels = c("Observado","Reforecast")) + scale_y_continuous(labels=percent) + ggtitle("Ajuste Gamma")

Question:

I am using `RStudio`

and I have a `time series`

data (`ts`

object) called `data1`

.

Here is how `data1`

looks:

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2014 135 172 179 189 212 47 301 183 247 292 280 325 2015 471 243 386 235 388 257 344 526 363 261 189 173 2016 272 267 197 217 393 299 343 341 315 305 384 497

To plot the above, I have run this code:

plot (data1)

and I get the following plot:

I want to have a plot that is broken by Year and I was thinking of implementing the `facet_grid`

feature found in `ggplot2`

but since my data is a `ts`

object, I can't use `ggplot2`

directly on it.

After some research, I've found that the `ggfortify`

library works with `ts`

objects. However, I am having a hard time trying to figure out to use the `facet_grid`

feature with it.

My aim to is to plot something like below from my `ts`

data:

'Female'and 'Male' will be replaced by the Years 2014, 2015 and 2016. The X-axis will be the Months (Jan, Feb, Mar, and so on) and the y-axis will be the values in the `ts`

file . I would prefer a line plot rather than a dot plot.

Am I on the right track here or is there another way of approaching this problem?

Answer:

We can use `ggplot2::autoplot`

. I will use `AirPassengers`

data as an example.

library(ggplot2) library(lubridate) autoplot(AirPassengers) + facet_grid(. ~ year(Index), scales = "free_x") + scale_x_date(date_labels = "%b")