## Hot questions for Using Ggplot2 in ggpmisc

Question:

A few years ago, a poster asked how to add regression line equation and R2 on ggplot graphs at the link below.

Adding Regression Line Equation and R2 on graph

The top solution was this:

lm_eqn <- function(df){ m <- lm(y ~ x, df); eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2, list(a = format(coef(m)[1], digits = 2), b = format(coef(m)[2], digits = 2), r2 = format(summary(m)$r.squared, digits = 3))) as.character(as.expression(eq)); } p1 <- p + geom_text(x = 25, y = 300, label = lm_eqn(df), parse = TRUE)

I am using this code and it works great. However, I was wondering if it is at all possible to make this code have the R2 value and regression line equation on separate lines, instead of being separated by a comma.

Instead of like this

Something like this

Thanks in advance for your help!

Answer:

EDIT:

In addition to inserting the equation, I have fixed the sign of the intercept value. By setting the RNG to `set.seed(2L)`

will give positive intercept. The below example produces negative intercept.

I also fixed the overlapping text in the `geom_text`

set.seed(3L) library(ggplot2) df <- data.frame(x = c(1:100)) df$y <- 2 + 3 * df$x + rnorm(100, sd = 40) lm_eqn <- function(df){ # browser() m <- lm(y ~ x, df) a <- coef(m)[1] a <- ifelse(sign(a) >= 0, paste0(" + ", format(a, digits = 4)), paste0(" - ", format(-a, digits = 4)) ) eq1 <- substitute( paste( italic(y) == b, italic(x), a ), list(a = a, b = format(coef(m)[2], digits = 4))) eq2 <- substitute( paste( italic(R)^2 == r2 ), list(r2 = format(summary(m)$r.squared, digits = 3))) c( as.character(as.expression(eq1)), as.character(as.expression(eq2))) } labels <- lm_eqn(df) p <- ggplot(data = df, aes(x = x, y = y)) + geom_smooth(method = "lm", se=FALSE, color="red", formula = y ~ x) + geom_point() + geom_text(x = 75, y = 90, label = labels[1], parse = TRUE, check_overlap = TRUE ) + geom_text(x = 75, y = 70, label = labels[2], parse = TRUE, check_overlap = TRUE ) print(p)

Question:

I'm using `R`

package `ggpmisc`

. Wonder how to put hat on y in Regression Equation or how to get custom Response and Explanatory variable name in Regression Equation on graph.

library(ggplot2) library(ggpmisc) df <- data.frame(x1 = c(1:100)) set.seed(12345) df$y1 <- 2 + 3 * df$x1 + rnorm(100, sd = 40) p <- ggplot(data = df, aes(x = x1, y = y1)) + geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) + stat_poly_eq(formula = y ~ x, aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), parse = TRUE) + geom_point() p

Answer:

I would turn off the default value for `y`

that is pasted in and build your own formula. For example

ggplot(data = df, aes(x = x1, y = y1)) + geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) + stat_poly_eq(formula = y ~ x, eq.with.lhs=FALSE, aes(label = paste("hat(italic(y))","~`=`~",..eq.label..,"~~~", ..rr.label.., sep = "")), parse = TRUE) + geom_point()

We use `eq.with.lhs=FALSE`

to turn off the automatic inclusion of `y=`

and then we `paste()`

the `hat(y)`

on to the front (with the equals sign). Note that the formatting comes from the `?plotmath`

help page.

Question:

I am trying to annotate a line plot with an arrow pointing to the highest point in line plot and displaying an arrow and maximum value on the plot. I am using the `mtcars`

dataset as my reference. Below is my code.

e <- df$mpg ggplot(df, aes(x=e, y=df$hp)) + geom_line() + annotate("segment", color="blue", x=max(e), xend = max(e), y=max(df$hp), yend=max(df$hp), arrow=arrow())

Thanks in advance,

Answer:

Are you looking for something like this:

labels <- data.frame(mpg = mtcars[which(mtcars$hp == max(mtcars$hp)), "mpg"]+7, hp = mtcars[which(mtcars$hp == max(mtcars$hp)), "hp"],text = paste0("Max value at mpg = ", mtcars[which(mtcars$hp == max(mtcars$hp)), "mpg"], " and hp = ", max(mtcars$hp))) ggplot(mtcars, aes(mpg, hp))+ geom_line()+ geom_text(data = labels, aes(label = text))+ annotate("segment", x=mtcars[which(mtcars$hp == max(mtcars$hp)), "mpg"]+2, xend=mtcars[which(mtcars$hp == max(mtcars$hp)), "mpg"]+.2, y= mtcars[which(mtcars$hp == max(mtcars$hp)), "hp"], yend= mtcars[which(mtcars$hp == max(mtcars$hp)), "hp"], arrow=arrow(), color = "blue")

Explanation: In order to annotate with the max, we need to find the position of mpg that is the maximum for hp. To do this we use `mtcars[which(mtcars$hp == max(mtcars$hp)), "mpg"]`

. The `which()`

statement gives us the row possition of that maximum so that we can get the correct value of mpg. Next we annotate with this position adding a little bit of space (i.e., the +2 and +.2) so that it looks nicer. Lastly, we can construct a dataframe with the same positions (but different offset) and use `geom_text()`

to add the data label.

Question:

Based on the example here Adding Regression Line Equation and R2 on graph, I am struggling to include the regression line equation for my model in each facet. However, I don't figure why is changing the limits of my x axis.

library(ggplot2) library(reshape2) df <- data.frame(year = seq(1979,2010), M02 = runif(32,-4,6), M06 = runif(32, -2.4, 5.1), M07 = runif(32, -2, 7.1)) df <- melt(df, id = c("year")) ggplot(data = df, mapping = aes(x = year, y = value)) + geom_point() + scale_x_continuous() + stat_smooth_func(geom = 'text', method = 'lm', hjust = 0, parse = T) + geom_smooth(method = 'lm', se = T) + facet_wrap(~ variable) # as you can see, the scale_x_axis goes back to 1800

If I include on the x the limits,

scale_x_continuous(limits = c(1979,2010))

it does not show the regression coefficient anymore. What am I doing wrong here?

*stat_smooth_func available here:* https://gist.github.com/kdauria/524eade46135f6348140

Answer:

You can use `stat_poly_eq`

function from the `ggpmisc`

package.

library(reshape2) library(ggplot2) library(ggpmisc) #> For news about 'ggpmisc', please, see https://www.r4photobiology.info/ #> For on-line documentation see https://docs.r4photobiology.info/ggpmisc/ df <- data.frame(year = seq(1979,2010), M02 = runif(32,-4,6), M06 = runif(32, -2.4, 5.1), M07 = runif(32, -2, 7.1)) df <- melt(df, id = c("year")) formula1 <- y ~ x ggplot(data = df, mapping = aes(x = year, y = value)) + geom_point() + scale_x_continuous() + geom_smooth(method = 'lm', se = TRUE) + stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~~")), label.x = "left", label.y = "top", formula = formula1, parse = TRUE, size = 3) + facet_wrap(~ variable)

ggplot(data = df, mapping = aes(x = year, y = value)) + geom_point() + scale_x_continuous() + geom_smooth(method = 'lm', se = TRUE) + stat_poly_eq(aes(label = paste(..eq.label.., sep = "~~~")), label.x = "left", label.y = 0.15, eq.with.lhs = "italic(hat(y))~`=`~", eq.x.rhs = "~italic(x)", formula = formula1, parse = TRUE, size = 4) + stat_poly_eq(aes(label = paste(..rr.label.., sep = "~~~")), label.x = "left", label.y = "bottom", formula = formula1, parse = TRUE, size = 4) + facet_wrap(~ variable)

Created on 2019-01-10 by the reprex package (v0.2.1.9000)

Question:

I'm trying to display the equations on the plot using the `stat_poly_eq`

function of `ggpmisc`

.

My problem is how to change the `y= ...`

in the equation, by `y1=...`

and `y2=...`

by referring to the `key`

argument.

I tried to add the `eq.with.lhs`

argument in the mapping but it does not recognize the argument.
I tried to pass a vector to the `eq.with.lhs`

argument but it overlapped both elements in each equation...

Do you have a better idea?

In the last case, I could use `geom_text`

after calculating the equation coefficients myself, but it seemed to be a less efficient way to solve the problem.

Here is a reprex of my problem.

data <- data.frame(x = rnorm(20)) %>% mutate(y1 = 1.2*x + rnorm(20, sd=0.2), y2 = 0.9*x + rnorm(20, sd=0.3)) %>% gather(value = value, key = key, -x) ggplot(data, aes(x = x, y = value)) + geom_point(aes(shape = key, colour = key)) + stat_poly_eq(aes(label = ..eq.label.., colour = key), formula = y ~ poly(x, 1, raw = TRUE), eq.x.rhs = "x", # eq.with.lhs = c(paste0(expression(y[1]), "~`=`~"), # paste0(expression(y[2]), "~`=`~")), eq.with.lhs = paste0(expression(y[ind]), "~`=`~"), parse = TRUE) + ylab(NULL)

Answer:

I'm not really sure if it's possible to do it through `ggpmisc`

, but you can change the data once the plot is built, like so:

library(tidyverse) library(ggpmisc) data <- data.frame(x = rnorm(20)) %>% mutate(y1 = 1.2*x + rnorm(20, sd=0.2), y2 = 0.9*x + rnorm(20, sd=0.3)) %>% gather(value = value, key = key, -x) p <- ggplot(data, aes(x = x, y = value)) + geom_point(aes(shape = key, colour = key)) + stat_poly_eq(aes(label = ..eq.label.., colour = key), formula = y ~ poly(x, 1, raw = TRUE), eq.x.rhs = "x", eq.with.lhs = paste0(expression(y), "~`=`~"), parse = TRUE) + ylab(NULL) temp <- ggplot_build(p) temp$data[[2]]$label <- temp$data[[2]]$label %>% fct_relabel(~ str_replace(.x, "y", paste0("y[", 1:2, "]"))) grid::grid.newpage() grid::grid.draw(ggplot_gtable(temp))

Question:

I extracted some longitudinal temperature data from a .nc weather dataset (*ncdf4* package) and would like to label the local extrema with their respective dates from x-axis using *ggplot2* and its extension *ggpmisc* that includes `stat_peaks`

/`stat_valleys`

. Oddly, all the labels read the same: "Dec 1969".

I figured the most likely culprit was that my data used for the x-axis was not formatted correctly as `Date`

, but the x-axis displays correctly and I have checked the class of the input data to confirm. I also tried applying `group=1`

which resulted in no change -- I admit I am new to R and ggplot2 (more familiar with Python/Pandas) and do not completely understand what group=1 does, though it was necessary to get the line to display correctly. Perhaps this is the result of a bug?

ggplot(df_denver, aes(x=Date, y=Temp..C., group=1)) + geom_line() + scale_x_date(date_labels="%b %Y", date_breaks = "10 years", expand=c(0,0)) + stat_peaks(span=24, ignore_threshold = 0.80, color="red") + stat_peaks(geom="text", span=24, ignore_threshold = 0.80, x.label.fmt = "%b %Y", color="red", angle=90, hjust=-0.1) + stat_valleys(span=24, ignore_threshold = 0.55, color="blue") + stat_valleys(geom="text", span=24, ignore_threshold = 0.55, x.label.fmt = "%b %Y", color="blue", angle=90, hjust=1.1) + labs(x="Date", y="Temp (C)", title="Monthly Air Surface Temp for Denver from 1880 on")

Here are the first 100 rows of my dataset that produce 3 peaks and 3 valleys to illustrate:

Date Temp..C. 1 1880-01-01 2.91287017 2 1880-02-01 -2.73586297 3 1880-03-01 -2.04185677 4 1880-04-01 0.37948364 5 1880-05-01 0.78548384 6 1880-06-01 0.44176754 7 1880-07-01 -1.06966007 8 1880-08-01 -0.53162575 9 1880-09-01 -0.29665694 10 1880-10-01 -2.08401608 11 1880-11-01 -9.46955109 12 1880-12-01 -1.52052176 13 1881-01-01 -2.53366208 14 1881-02-01 -1.88263988 15 1881-03-01 -0.06864686 16 1881-04-01 3.32321167 17 1881-05-01 1.75613177 18 1881-06-01 2.82765651 19 1881-07-01 1.76543093 20 1881-08-01 1.39409852 21 1881-09-01 -0.98141575 22 1881-10-01 -0.63346595 23 1881-11-01 -1.95676208 24 1881-12-01 3.28983855 25 1882-01-01 -0.64792717 26 1882-02-01 2.15854502 27 1882-03-01 2.91465187 28 1882-04-01 0.56616443 29 1882-05-01 -1.89441001 30 1882-06-01 -0.63149375 31 1882-07-01 -0.64883423 32 1882-08-01 0.82802373 33 1882-09-01 0.66150969 34 1882-10-01 -0.54113626 35 1882-11-01 -1.21310496 36 1882-12-01 1.30559540 37 1883-01-01 -1.41802752 38 1883-02-01 -6.39232874 39 1883-03-01 2.96320987 40 1883-04-01 -0.48122203 41 1883-05-01 -0.99614143 42 1883-06-01 -0.67229420 43 1883-07-01 -0.56595141 44 1883-08-01 0.52161294 45 1883-09-01 0.09190032 46 1883-10-01 -2.65115738 47 1883-11-01 1.88332438 48 1883-12-01 -0.19942272 49 1884-01-01 -0.34669495 50 1884-02-01 -2.21085262 51 1884-03-01 0.55254096 52 1884-04-01 -1.21859336 53 1884-05-01 -0.40969065 54 1884-06-01 0.44454563 55 1884-07-01 1.28881764 56 1884-08-01 -1.09331822 57 1884-09-01 1.52377772 58 1884-10-01 1.76569140 59 1884-11-01 0.72411090 60 1884-12-01 -4.64927006 61 1885-01-01 -1.03242493 62 1885-02-01 -0.79325873 63 1885-03-01 0.65910935 64 1885-04-01 -0.10181000 65 1885-05-01 -1.50702798 66 1885-06-01 -1.25801849 67 1885-07-01 -0.88433135 68 1885-08-01 -1.18410277 69 1885-09-01 0.15284735 70 1885-10-01 -0.91721576 71 1885-11-01 1.82403481 72 1885-12-01 1.68553519 73 1886-01-01 -4.21202993 74 1886-02-01 2.43953681 75 1886-03-01 -2.24947429 76 1886-04-01 -1.22557247 77 1886-05-01 2.66594267 78 1886-06-01 -0.21662886 79 1886-07-01 1.09909940 80 1886-08-01 0.63720244 81 1886-09-01 -0.11845125 82 1886-10-01 0.49225059 83 1886-11-01 -3.16969180 84 1886-12-01 2.18220520 85 1887-01-01 0.51427501 86 1887-02-01 -0.69656581 87 1887-03-01 3.96693182 88 1887-04-01 0.92614591 89 1887-05-01 1.66550291 90 1887-06-01 1.88668025 91 1887-07-01 -1.48990893 92 1887-08-01 -0.98355341 93 1887-09-01 0.93172997 94 1887-10-01 -1.12551820 95 1887-11-01 1.07798636 96 1887-12-01 -2.15758419 97 1888-01-01 -1.69266903 98 1888-02-01 2.55955243 99 1888-03-01 -1.83599913 100 1888-04-01 3.63450384

As you can see, the labels produced by `stat_peaks`

and `stat_valleys`

are identical and not even within the range of the abbreviated data, rather than the correct dates corresponding to the x-axis.

Monthly Air Surface Temp for Denver from 1880 on

Answer:

`stat_peaks`

and `stat_valleys`

labels will work with dates in `POSIXct`

format:

df_denver$Date <- as.POSIXct(df_denver$Date, format = "%Y-%m-%d") ggplot(df_denver, aes(x=Date, y=Temp)) + geom_line() + scale_x_datetime(date_labels="%b %Y", date_breaks = "1 year", expand=c(0,0)) + stat_peaks(span=24, ignore_threshold = 0.80, color="red") + stat_peaks(geom="text", span=24, ignore_threshold = 0.80, x.label.fmt = "%b %Y", color="red", angle=90, hjust=-0.1) + stat_valleys(span=24, ignore_threshold = 0.55, color="blue") + stat_valleys(geom="text", span=24, ignore_threshold = 0.55, x.label.fmt = "%b %Y", color="blue", angle=90, hjust=1.1) + labs(x="Date", y="Temp (C)", title="Monthly Air Surface Temp for Denver from 1880 on") + expand_limits(y = 6)

**Note**: `scale_x_date`

was changed to `scale_x_datetime`

. In addition, changed `date_breaks`

to 1 year to demonstrate x-axis labels for example data, and `expand_limits`

to ensure peak labels are readable. `group=1`

should not be needed.

Question:

I am trying to create some correlation plots based of a data frame that I created using dplyr's spread() function. When I used the spread function, it created NAs in the new data frame. This makes sense because the data frame had concentration values for different parameters at different time periods.

**Here is an example screenshot of the original data frame:**

**When I used the spread function it gave me a data frame like this(sample data):**

structure(list(orgid = c("11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD", "11NPSWRD"), locid = c("11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2", "11NPSWRD-MORR_NPS_PR2" ), stdate = structure(c(9891, 9891, 9891, 9920, 9920, 9920, 9949, 9949, 9949, 9978, 9978, 9978, 10011, 10011, 10011, 10067, 10067, 10073, 10073, 10073), class = "Date"), sttime = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), class = c("hms", "difftime"), units = "secs"), valunit = c("uS/cm", "mg/l", "mg/l", "uS/cm", "mg/l", "mg/l", "uS/cm", "mg/l", "mg/l", "uS/cm", "mg/l", "mg/l", "uS/cm", "mg/l", "mg/l", "uS/cm", "mg/l", "uS/cm", "mg/l", "mg/l"), swqs = c("FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP", "FW2-TP" ), WMA = c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), year = c(1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L, 1997L), Chloride = c(NA, 35, NA, NA, 45, NA, NA, 30, NA, NA, 30, NA, NA, 30, NA, NA, NA, NA, 35, NA), `Specific conductance` = c(224, NA, NA, 248, NA, NA, 204, NA, NA, 166, NA, NA, 189, NA, NA, 119, NA, 194, NA, NA), `Total dissolved solids` = c(NA, NA, 101, NA, NA, 115, NA, NA, 96, NA, NA, 79, NA, NA, 89, NA, 56, NA, NA, 92)), .Names = c("orgid", "locid", "stdate", "sttime", "valunit", "swqs", "WMA", "year", "Chloride", "Specific conductance", "Total dissolved solids"), row.names = c(NA, 20L), class = "data.frame")

The problem I am having is when I try and create the correlation plot it's giving me a plot with only one point.. I'm guessing this is because there are NAs in the data frame.. But when I try and filter the NAs it gives me a data frame with 0 observations.. Any help would be greatly appreciated!!

**Example code to create correlation plot:**

plot1<-ggplot(data=df,aes(x="Specific conductance",y="Chloride"))+ geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x)+ geom_point()

**I would like to create a plot like this:**

Answer:

You need to remove NAs & collapse rows which have the same Date

library(tidyverse) # clean up column names by removing spaces df <- df %>% select_all(~str_replace(., " ", "_")) # removing NAs & collapsing rows which have the same Date require(data.table) DT <- data.table(df) DT2 <- unique(DT[, lapply(.SD, na.omit), by = stdate], by = "stdate") library(ggpmisc) formula1 <- y ~ x ggplot(data = DT2, aes(x = Specific_conductance, y = Chloride)) + geom_point() + geom_smooth(method = "lm", se = FALSE, formula = formula1) + stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~~")), label.x.npc = "left", label.y.npc = "top", formula = formula1, parse = TRUE, size = 6) + theme_bw(base_size = 14)

Created on 2018-09-10 by the reprex package (v0.2.0.9000).

Question:

I have the following working-toy example:

trunctiris <- iris [1:102,] analysis <- trunctiris %>% group_by(Species) %>% nest() %>% mutate(model = map(data, ~lm(Sepal.Length ~ Sepal.Width, data = .)), cor = map(data, ~tidy(cor.test(.x$Sepal.Length, .x$Sepal.Width), 3))) stats <- analysis %>% unnest(cor) ggplot(trunctiris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point(shape = 21) + geom_text(data = stats, aes(label = sprintf("r = %s", round(estimate, 3)), x = 7, y = 4)) + geom_text(data = stats, aes(label = sprintf("p = %s", round(p.value, 3)), x = 7, y = 3.8)) + geom_smooth(method = "lm", formula = y ~ x) + stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~~")), formula = y ~ x, parse = TRUE) + facet_wrap(~Species)

The code was provided in another question. However, I haven't been able to make it work with my data. The problem is that I have some (not all) groups that have a less than 3 observations, and so, in the "analysis" part R returns:

Error in mutate_impl(.data, dots) : not enough finite observations

which is in relation to the fact that there are not enough observations in the group (in this case: virginica). I want to get around this, I've tried 'try(if nrow(data) >= 2)' or similar.. like the following:

analysis <- iris %>% group_by(Species) %>% nest() %>% mutate(model = map(data, ~lm (Sepal.Length ~ Sepal.Width, data = .)), cor = if_else( nrow(data) <= 2 , warning ("Must have at least 3 rows of data"), (map(data, ~tidy(cor.test(.x$Sepal.Length, .x$Sepal.Width), 3)))))

which returns:

Error in mutate_impl(.data, dots) : not enough finite observations In addition: Warning message: In if_else(nrow(list(list(Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, : Must have at least 3 rows of data

Does anyone know an easy way to get around this? I'd like to skip the problematic group and keep on going.

Many thanks and sorry for my very basic R skills.

Answer:

`purrr::safely`

or `purrr::possibly`

allow for easy guarding against errors when you are `map`

ping. In this case, a good strategy is to wrap the call to `tidy(cor.test(...`

in `possibly`

and return an empty data.frame if an error occurs

library(purrr) analysis <- trunctiris %>% group_by(Species) %>% nest() %>% mutate( model = map(data, ~lm(Sepal.Length ~ Sepal.Width, data = .)), cor = map(data, possibly( ~tidy(cor.test(.x$Sepal.Length, .x$Sepal.Width), 3), otherwise = data.frame()) ) )

# A tibble: 3 × 4 Species data model cor <fctr> <list> <list> <list> 1 setosa <tibble [50 × 4]> <S3: lm> <data.frame [1 × 8]> 2 versicolor <tibble [50 × 4]> <S3: lm> <data.frame [1 × 8]> 3 virginica <tibble [2 × 4]> <S3: lm> <data.frame [0 × 0]> #<- Note the empty df here

Which becomes:

unnest(analysis)

# A tibble: 2 × 9 Species estimate statistic p.value parameter conf.low conf.high <fctr> <dbl> <dbl> <dbl> <int> <dbl> <dbl> 1 setosa 0.7425467 7.680738 6.709843e-10 48 0.5851391 0.8460314 2 versicolor 0.5259107 4.283887 8.771860e-05 48 0.2900175 0.7015599 # ... with 2 more variables: method <fctr>, alternative <fctr>

And so the group that gave an error is sucessfully removed from the end result.

Question:

I have run the script below numerous times and it has worked until this morning, when it suddenly produced the error message:

(Error in terms.formula(formula, data = data) : 'data' argument is of the wrong type.

I have not changed anything and I need to find out why it suddenly doesn't seem to work. Previous answers to similar questions have not helped.

My data:

DPUT(harvest2) structure(list(Year = c(1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016), Count = c(750, 757, 592, 693, 667, 757, 719, 670, 733, 796, 923, 921, 944, 1010, 825, 762, 825, 844, 809, 830, 768, 823, 749, 675, 700, 637, 708, 697, 754, 694, 636, 717, 786, 731, 769, 732, 710, 610.5, 593, 529, 664, 788, 731, 644, 653, 656), SexRat = c(1.91812865497076, 2.34567901234568, 1.69178082191781, 1.46766169154229, 1.30396475770925, 1.4364406779661, 1.32098765432099, 1.48584905660377, 1.5906976744186, 1.91414141414141, 1.48905109489051, 1.61382113821138, 1.52380952380952, 1.87777777777778, 1.75438596491228, 1.6695652173913, 1.81566820276498, 1.79295154185022, 1.85024154589372, 1.75446428571429, 1.83163265306122, 1.92857142857143, 1.76635514018692, 1.5, 2.26190476190476, 1.76704545454545, 2.38125, 1.80924855491329, 2.33333333333333, 1.81182795698925, 2.20446096654275, 2.02790697674419, 2.1140350877193, 2.05, 2.20183486238532, 1.90983606557377, 2.02262443438914, 1.75116279069767, 1.86842105263158, 1.87951807228916, 2.08542713567839, 2.01724137931034, 1.95833333333333, 1.81165919282511, 2.12135922330097, 1.97260273972603)), class = "data.frame", row.names = c(NA, -46L))

My script:

# Function for the equation lm_eqn = function(df){ m = lm(y ~ poly(x, 3), df) #3rd degree polynomial eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2, list(a = format(coef(m)[1], digits = 2), b = format(coef(m)[2], digits = 2), r2 = format(summary(m)$r.squared, digits = 4))) as.character(as.expression(eq)) } # Make the plot library(ggplot2) ggplot(harvest2, aes(x = Year, y = Count)) + scale_y_continuous(minor_breaks = seq(500, 1100, by = 50), breaks = seq(500, 1100, by = 100), limits = c(500, 1100), expand = c(0, 0)) + scale_x_continuous(minor_breaks = seq(1970, 2018, by = 1), breaks = seq(1970, 2018, by = 5), limits = c(1970, 2018)) + geom_point(stat = 'identity', size=2) + stat_smooth(method = "lm", se = TRUE, fill = NA, size = 1.3, formula = y ~ poly(x, 3, raw = TRUE), col = "red") + annotate("text", x = 1975, y = 1075, label = lm_eqn(df), hjust = 0, size = 3.5, parse = TRUE) + xlab(" ") + ylab("Count") + theme_light() + ggtitle(" ")

Any help much appreciated.

Answer:

How about using `stat_poly_eq`

from the `ggpmisc`

package? See this if you want to separate the equation and R2 into two lines.

library(ggplot2) library(ggpmisc) # define formula formula1 <- y ~ poly(x, 3, raw = TRUE) ggplot(harvest2, aes(x = Year, y = Count)) + scale_y_continuous( minor_breaks = seq(500, 1100, by = 50), breaks = seq(500, 1100, by = 100), limits = c(500, 1100), expand = c(0, 0)) + scale_x_continuous( minor_breaks = seq(1970, 2018, by = 1), breaks = seq(1970, 2018, by = 5), limits = c(1970, 2018)) + geom_point(stat = "identity", size = 2) + stat_smooth( method = "lm", se = TRUE, fill = NA, size = 1.3, formula = formula1, col = "red") + # show the equation and R2 stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~~")), label.x.npc = "left", label.y.npc = "top", formula = formula1, parse = TRUE, size = 5) + xlab(" ") + ylab("Count") + theme_light() + ggtitle(" ")

Created on 2019-02-12 by the reprex package (v0.2.1.9000)

Question:

I have to create a large (100+) number of ggplots of linear models. I would like to add the p-value (and potentially R2) to each plot. I know it is possible to do this using `ggpmisc`

. Here, I employ `stat_fit_glance`

to add the p-value. My 'problem' is that both of these require me to run `lm`

first to be inserted as formula = my_lm.

As I have to create a large number of plots, I was wondering if there is a way to avoid creating the lm object first, and simply have it calculated while producing the ggplot? I can do it for t-tests for boxplots using `stat_compare_means`

, and really hope to find a way to do it with lm's as well.

My code is present below. I would like to be able to skip the first line of code:

my_lm <- lm(y ~ x) ggplot(data = complete, aes(x= x, y = y))+ geom_point()+ theme_classic()+ geom_smooth(method = "lm")+ labs(x="Ellenberg F", y = "Species richness")+ stat_fit_glance(method = 'lm', method.args = list(data = complete, formula = my_lm), geom = 'text', aes(label = paste("p-value = ", signif(..p.value.., digits = 4), sep = "")), label.x = 8.5, label.y = 25, size = 3)

I have tried simply putting formula = y ~ x with no luck.

Answer:

From the help of `ggpmisc::stat_fit_glance`

: `method.args = list(formula = y ~ x)`

.
This means that you don't need to run an `lm`

first.
You can only specify the formula for the linear model.

library(ggpmisc) set.seed(1) n <- 100 x <- 8+rnorm(n) y <- 11+x+2*rnorm(n) complete <- data.frame(x, y) summary(lm(y~x)) ggplot(data = complete, aes(x= x, y = y))+ geom_point()+ theme_classic()+ geom_smooth(method = "lm")+ labs(x="Ellenberg F", y = "Species richness")+ stat_fit_glance(method = 'lm', method.args = list(formula = y ~ x), geom = 'text', aes(label = paste("p-value=", signif(..p.value.., digits = 4), " R-squared=", signif(..r.squared.., digits = 3), sep = "")), label.x = 8.5, label.y = 25, size = 5)

Question:

I want to show the linear equation and the R-squared in the each plot in facet mode. This is my code so far.

library("ggplot2") datos <- read.table("~/Documents/master2/plots/dosis_todos/datos.dat", header=TRUE, quote="\"") ggplot(datos, aes(x = corriente, y = dosis, colour = cristal)) + geom_point() + geom_smooth(method="lm", se=F) + facet_wrap(~datos$cristal)

After reading about ggpmisc in this answer, I tried

my.formula <- y ~ x library("ggpmisc") ggplot(datos, aes(x = corriente, y = dosis, colour = cristal)) + geom_point() + geom_smooth(method="lm", se=F, formula=my.formula) + stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), formula = my.formula, parse = TRUE) + facet_wrap(~datos$cristal)

Which kinda works, except that the position of the equation goes down for every plot until disappears...

If I save my plot big enough, I can see all my text in the 9 plots ....going down.

So I guess the question is how to keep fixed the position of the equation and the R-squared information?

Thanks

Ps. Yes, I know N57 has only 3 points :(

Ps. Here is the link to my data

Answer:

@murpholinox Yes, you are correct, the code in 'ggpmisc' is not smart enough (yet) to detect when aesthetics values like the different colours are unique to each panel.
However, it is possible to manually position the equations passing a position in data units to parameters `label.y`

and/or `label.x`

. So, there is a work-around.

library("ggplot2") library("ggpmisc") datos <- read.table("datos.dat", header=TRUE, quote="\"") my.formula <- y ~ x ggplot(datos, aes(x = corriente, y = dosis, colour = cristal)) + geom_point() + geom_smooth(method="lm", se=F, formula=my.formula) + stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), formula = my.formula, parse = TRUE, label.y = 0.9) + ylim(0, 1) + facet_wrap(~datos$cristal)

It is also possible to pass a vector to `label.y`

and `label.x`

, so that each equation can be manually positioned for each panel.

ggplot(datos, aes(x = corriente, y = dosis, colour = cristal)) + geom_point() + geom_smooth(method="lm", se=F, formula=my.formula) + stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")), formula = my.formula, parse = TRUE, label.y = c(rep(0.9, 6), rep(0.15, 2), 0.9)) + ylim(0, 0.95) + facet_wrap(~datos$cristal)

Question:

I have the following data frame:

expected observed group 1: 0.5371429 0.0000 1 2: 1.3428571 1.3736 1 3: 2.6857143 2.4554 1 4: 5.3714286 3.6403 1 5: 0.5294118 0.0000 2 6: 1.3235294 1.1494 2 7: 2.6470588 1.1364 2 8: 5.2941176 4.9774 2 9: 0.5201207 0.0000 3 10: 1.3003018 1.4327 3 11: 2.6006036 2.5918 3 12: 5.2012072 8.0769 3 13: 0.5155039 1.4851 4 14: 1.2887597 1.0638 4 15: 2.5775194 3.1700 4 16: 5.1550388 6.2500 4 17: 0.4976959 0.0000 5 18: 1.2442396 1.2384 5 19: 2.4884793 3.1073 5 20: 4.9769585 4.8148 5

I would like to scatter plot each dataset according to group, so I have the following code:

sp <- ggplot(new_df, aes(x = expected, y = observed, colour = group)) + geom_point() sp + scale_color_gradientn(colours = rainbow(5))

and receive the below plot:

My question is how to add a linear line (intercept = 0,0) to each of the different groups? meaning, that in the end, I'll have 5 linear lines in different colors representing each group on the same plot.

And, is there a way to show the equation for each line in a legend?

Answer:

You can get the linear lines and equation/R2 text with `geom_smooth`

from `ggplot2`

and `stat_poly_eq`

from `ggpmisc`

package

dat <- "expected observed group 0.5371429 0 1 1.3428571 1.3736 1 2.6857143 2.4554 1 5.3714286 3.6403 1 0.5294118 0 2 1.3235294 1.1494 2 2.6470588 1.1364 2 5.2941176 4.9774 2 0.5201207 0 3 1.3003018 1.4327 3 2.6006036 2.5918 3 5.2012072 8.0769 3 0.5155039 1.4851 4 1.2887597 1.0638 4 2.5775194 3.17 4 5.1550388 6.25 4 0.4976959 0 5 1.2442396 1.2384 5 2.4884793 3.1073 5 4.9769585 4.8148 5 " library(ggplot2) library(ggpmisc) df <- read.table(text = dat, header = TRUE) df$group <- factor(df$group) formula <- y ~ x # needed for ggpmisc's equation and R2 text # Put equation & R2 coef to the top left corner ggplot(df, aes(expected, observed, colour = group)) + geom_point(size = 2, alpha = 0.3) + geom_smooth(method = "lm", formula = formula, se = FALSE) + stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~~")), label.x.npc = "left", label.y.npc = "top", formula = formula, parse = TRUE, size = 4) + scale_color_brewer(palette = "Dark2") + theme_bw(base_size = 16)

Question:

I have `ggplot2`

installed and `library(ggplot2)`

runs. However, I get the following error when I try to run `library(ggpmisc)`

. I have tried restarting R and reinstalling `ggplot2`

to no avail. In addition, I can generate plots using `ggplot2`

but can't access functions like `stat_poly_eq`

since I can't open `ggpmisc`

. I have the current version of RStudio installed

library(ggplot2) library(ggpmisc)

Error in library(ggpmisc) : there is no package called ‘ggpmisc’

Answer:

Have you installed the library `ggpmisc`

? If you haven't please run:

install.packages("ggpmisc")

and then run

library("ggpmisc")

and now everything should work!

As it is in it's CRAN webpage, ggpmisc is a set of "Extensions to 'ggplot2' respecting the grammar of graphics paradigm." i.e. isn't a part of ggplot. That means that the packages are not binded, so each one should be installed individually.

Question:

df%>% group_by(approved_date)%>% summarise(rev=sum(gmv))%>% ggplot(aes(x = approved_date, y = rev)) + geom_line() + geom_smooth(method = 'auto', se = FALSE) + labs(x = 'Date', y = 'Revenue', title = 'Revenue by Date') + scale_y_continuous(labels = function(x) format(x, scientific = FALSE)) + stat_peaks(colour = "red", span = NULL) + stat_valleys(colour = "blue", span = NULL) + geom_text(aes(label = round(rev, 0)), vjust = "inward", hjust = "inward", show.legend = FALSE, check_overlap = TRUE)

I have this code which on running labels all values of Local Maxima and Minima. I want only the value of Global Maximum and Global Minimum.How to do that?

Answer:

As the code in the question cannot be run for lack of data, I show an example slightly modified from the package User Guide. In this case this different example should be enough to work out the solution.

library(ggpmisc) ggplot(lynx, as.numeric = FALSE) + geom_line() + stat_peaks(colour = "red") + stat_peaks(geom = "text", colour = "red", vjust = -0.5, check_overlap = TRUE, span = NULL) + ylim(-100, 7300)

In other words geom `"text"`

should be passed as argument to `stat_peaks()`

as well as `span = NULL`

to get a single label. If you add `geom_text()`

directly, peaks are not selected but instead all values stored in the variable mapped to the `label`

aesthetic are added to the plot.