Hot questions for Using Ggplot2 in ggvis

Question:

I'm trying to reproduce a ggplot2 plot using ggvis. The plot aims at representing the coordinates of points (from a Correspondence Analysis) together with their clusters (hclust) Standard Dispersion Ellipses.


TL; DR

I'd like to make a ggvis plot with multiple layers based on multiple datasets. Thus, the functional/pipe approach stops me from grouping one of the layers and not the other.

The whole (briefly commented) code is there : https://gist.github.com/RCura/a135446cda079f4fbc10


Here's the code for creating the data:

 a <- rnorm(n = 100, mean = 50, sd = 5)

 b <- rnorm(n = 100, mean = 50, sd = 5)

 c <- rnorm(n = 100, mean = 50, sd = 5)

 mydf <- data.frame(A = a, B = b, C = c, row.names = c(1:100))

 library(ade4)

 myCA <- dudi.coa(df = mydf,scannf = FALSE,  nf = 2)

 myDist <- dist.dudi(myCA, amongrow = TRUE)

 myClust <- hclust(d = myDist, method = "ward.D2")

 myClusters <- cutree(tree = myClust, k = 3)

 myCAdata <- data.frame(Axis1 = myCA$li$Axis1, Axis2 = myCA$li$Axis2, Cluster = as.factor(myClusters))

 library(ellipse) # Compute Standard Deviation Ellipse

 df_ellipse <- data.frame()

 for(g in levels(myCAdata$Cluster)){
   df_ellipse <- rbind(df_ellipse,
                 cbind(as.data.frame(
                 with(myCAdata[myCAdata$Cluster==g,],
                 ellipse(cor(Axis1, Axis2),
                 level=0.7,
                 scale=c(sd(Axis1),sd(Axis2)),
                 centre=c(mean(Axis1),mean(Axis2))))),
                 Cluster=g))
 }

I can plot this through ggplot2:

library(ggplot2)

myPlot <- ggplot(data=myCAdata, aes(x=Axis1, y=Axis2,colour=Cluster)) +
  geom_point(size=1.5, alpha=.6) +
  geom_vline(xintercept = 0, colour="black",alpha = 0.5, linetype = "longdash" ) +
  geom_hline(xintercept = 0, colour="black", alpha = 0.5, linetype = "longdash" ) +
  geom_path(data=df_ellipse, aes(x=x, y=y,colour=Cluster), size=0.5, linetype=1)
myPlot

But I can't find how to plot this using ggvis.

I can plot the 2 different layers:

library(ggvis)

all_values <- function(x) { paste0(names(x), ": ", format(x), collapse = "<br />")}

 ggDF <- myCAdata

 ggDF$name <- row.names(ggDF)

## Coordinates plot
myCoordPlot <- ggvis(x = ~Axis1, y = ~Axis2, key := ~name, data = ggDF) %>%

  layer_points(size := 15, fill= ~Cluster, data = ggDF) %>%

  add_tooltip(all_values, "hover")

 myCoordPlot

Ellipses plot (no tooltip requested)
 myEllPlot <- ggvis(data = df_ellipse, x = ~x,  y = ~ y) %>%

  group_by(Cluster) %>%

  layer_paths(x= ~x, y= ~y, stroke = ~Cluster, strokeWidth := 1)

 myEllPlot

But when I want to plot the 2 layers on the same plot :

 myFullPlot <- ggvis(data = df_ellipse, x = ~x,  y = ~ y) %>%

 layer_paths(x= ~x, y= ~y, stroke = ~Cluster, strokeWidth := 1) %>%

 layer_points(x = ~Axis1, y= ~Axis2, size := 15, fill= ~Cluster, data = ggDF) %>%

 add_tooltip(all_values, "hover")

 myFullPlot

The ellipses are not grouped, so, the color don't fit, and the ellipses are not separated. If I try to group my Ellipses, it doesn't work: the group_by is only required by the layer_paths, and it mess up the layer_points.

Any idea how to make this work? And sorry for this very long post, but I've been trying to make this work for hours :/


Answer:

The problem is that when you try to combine the two, you do not group_by Cluster on the ellipsis dataset. You need to do the following for it to work:

myFullPlot <- ggvis(data = df_ellipse, x = ~x, y = ~ y) %>% group_by(Cluster) %>%

  layer_paths(stroke = ~Cluster, strokeWidth := 1) %>%

  layer_points(x = ~Axis1, y= ~Axis2, size := 15, fill= ~Cluster, data = ggDF)

myFullPlot

And this way you get the graph you want!

P.S. I assume there is some randomness in your data creation because I got a different data set than yours.

Question:

I have been looking into interactive plots in R. I know that there are several packages to create interactive plots, especially scatterplots, but I am looking for a certain functionality.

For example this plot. One can hover with the mouse over the buttons to get a small numerical summary of the data behind the point, i.e. a tooltip.

When you have a data set with more variables, it is often nice to explore/visualize scores from PCA, or do multi-dimensional-scaling(MDS). But if one would plot the data in an interactive manner, like the example above, the summary when one hovers over the point is not giving so much information, i.e. just a long list of numbers. It would be more nice to be able to generate a custom plot to display, instead of just displaying the numerical values.

So my question is:

Is it possible, (within some of the packages available in R), to generate a visual summary when one hovers over a point in a scatter plot. This could be a barplot, or just some user-specified plot function, that takes one row from the data.frame as an argument.

If this is possible, then it would greatly help in understanding the results from MDS in a quick manner.

EDIT:

Here is some code to do MDS on the iris data set:

library(ggplot2)
library(plotly)
d <- dist(iris[,-5]) # euclidean distances between the rows
fit <- cmdscale(d,eig=TRUE, k=2) # k is the number of dim

# Put coordinates and original data in one data.frame
x <- fit$points[,1]
y <- fit$points[,2]
pDat <- data.frame(x=x,y=y)
pDat <- cbind(pDat,iris)
p <- ggplot(pDat) + geom_point(aes(x,y))
ggplotly(p)

First, now the tooltip only includes the x,y coordinates. I would like the tooltip to contain the values for the original 4 variables. Then instead of the original 4 variables behind the datapoint, I would like to display the tooltip as a barplot. The MDS preserves the distance between the data points, so one would be able to hover gradually with the mouse, and see the barplot, almost change continuously, because the distances are preserved. In my usage case I have 30 variables behind each point, so a barplot summary gives more visual information than 30 numerical values.


Answer:

If you're using RStudio, the plotly package should be friendly enough to use. For instance:

library(ggplot2)
library(plotly) 
p <- ggplot(iris, aes(Sepal.Length, Petal.Length, colour=Species)) + geom_point()
 ggplotly(p)

The information displayed when hover upon one point looks like:

Question:

Is there a way to make ggplot2's geom_density() function mimic the behavior of ggvis's layer_densities()? That is, make it so p1 looks like p3 (see below) without the call to xlim()? Specifically, I prefer the view that smooths the tails of the density curve.

library(ggvis)
library(ggplot2)

faithful %>% 
  ggvis(~waiting) %>% 
  layer_densities(fill := "green") -> p1

ggplot(faithful, aes(x = waiting)) +
  geom_density(fill = "green", alpha = 0.2) -> p2

ggplot(faithful, aes(x = waiting)) +
  geom_density(fill = "green", alpha = 0.2) +
  xlim(c(30, 110)) -> p3

p1
p2
p3

ggvis Output:

ggplot2 "default":

ggplot2 "desired":

Note: One can make ggvis mimic ggplot2 via the following (using trim=TRUE), but I would like to go the other direction...

faithful %>% 
  compute_density(~waiting, trim=TRUE) %>% 
  ggvis(~pred_, ~resp_) %>% 
  layer_lines()

Answer:

How about calling xlim, but with limits that are defined programmatically?

l <- density(faithful$waiting)
ggplot(faithful, aes(x = waiting)) +
  geom_density(fill = "green", alpha = 0.2) +
  xlim(range(l$x))

The downside is double density estimation though, so keep that in mind.

Question:

I have read a similar post on SO, but was not able to adapt the answer to my specific case. I am working with time series data and would like to combine two different data sets into the same plot. Although I could combine the data into one dataframe, I am really interested in understanding how to reference multiple datasets.

Mock Data:

require(ggvis)

dfa <- data.frame(
date_a = seq(from= as.Date("2015-06-10"), 
        to= as.Date("2015-07-01"), by= 1),
val_a = c(2585.150, 2482.200, 3780.186, 3619.601, 
        0.000, 0.000, 3509.734, 3020.405, 
        3271.897, 3019.003, 3172.084, 0.000, 
        0.000, 3319.927, 2673.428, 3331.382, 
        3886.957, 2859.887, 0.000, 0.000, 
        2781.443, 2847.377) )

dfb <- data.frame(
date_b = seq(from= as.Date("2015-07-02"), 
        to= as.Date("2015-07-15"), by= 1),
val_b = c(3250.75429, 3505.43477, 3208.69141,
        -2.08175, -27.30244, 3324.62348, 
        2820.91075, 3250.75429, 3505.43477,
        3208.69141, -2.08175, -27.30244,
        3324.62348, 2820.91075) )

Using the data provided above, I am able to create separate plots with the code below:

Separate Plots: (Works)

dfa %>%
ggvis( x= ~date_a , y= ~val_a, stroke := "black", opacity := 0.5 ) %>% 
    scale_datetime("x", nice = "month", domain = c(as.Date("2015-06-10"),
    as.Date("2015-07-15") )) %>%
    layer_lines() %>% layer_points( fill := "black" )

dfb %>%
ggvis( x= ~date_b , y= ~val_b, stroke := "red", opacity := 0.5 ) %>% 
    scale_datetime("x", nice = "month", domain = c(as.Date("2015-06-10"),
    as.Date("2015-07-15") )) %>%
    layer_lines() %>% layer_points( fill := "red" )

The desired output is these two lines (black and red) to be on the same plot. Here are a couple of unsuccessful attempts:

Attempt #1 adapted from SO post:

ggvis( data = dfa, x = ~date_a, y = ~val_a) %>% layer_lines(stroke := "black",  opacity := 0.5 ) %>%
    layer_lines( data = dfb, x= ~date_b , y= ~val_b, stroke := "red", 
    opacity := 0.5 ) %>% 
    scale_datetime("x", nice = "month", domain = c(as.Date("2015-06-10"), 
    as.Date("2015-07-15") )) 

## Error in new_prop.default(x, property, scale, offset, mult, env, event,  : 
##  Unknown input to prop: c(16618, 16619, 16620, 16621, 16622, 16623, 16624, ...

Attempt #2 based on RStudio documentation:

ggvis( data = NULL, x = ~date_a, y = ~val_a) %>%
    layer_lines(stroke := "black",  opacity := 0.5, data = dfa ) %>%
    layer_lines( x= ~date_b , y= ~val_b, stroke := "red", 
    opacity := 0.5, data = dfb ) %>% 
    scale_datetime("x", nice = "month", domain = c(as.Date("2015-06-10"), 
    as.Date("2015-07-15") )) 

## Error in func() : attempt to apply non-function

Here is a minimalistic implementation in ggplot2:

require(ggplot2)

ggplot() + 
  geom_line(data = dfa, aes(x = date_a, y = val_a ), colour = "black") +     
  geom_line(data = dfb, aes(x = date_b, y = val_b ), colour = "red") 

Again, a working solution and brief explanation would be greatly appreciated. Thank you in advance for the help.


Answer:

Well, it looks like layer_lines may not properly by taking the data argument. I think you can successfully use layer_paths here. They work similarly, but layer_paths works in the order of the data so you'd need to make sure your time series are arranged correctly before plotting.

First, when I look at the layer_paths basic function it, like many other layer functions, has a specific data argument.

layer_paths
function (vis, ..., data = NULL) 
{
    add_mark(vis, "line", props(..., env = parent.frame()), data, 
        deparse2(substitute(data)))
}
<environment: namespace:ggvis>

While layer_lines has the ... for more arguments, it doesn't have a data argument and it doesn't seem like things work with it.

layer_lines
function (vis, ...) 
{
    x_var <- vis$cur_props$x$value
    layer_f(vis, function(x) {
        x <- auto_group(x, exclude = c("x", "y"))
        x <- dplyr::arrange_(x, x_var)
        emit_paths(x, props(...))
    })
}
<environment: namespace:ggvis>

To test, I made a really basic graph, trying to use the data argument in layer_lines.

ggvis() %>%
    layer_lines(data = dfb, x= ~date_b , y= ~val_b, stroke := "red") 

This fails with an error.

Error in func() : attempt to apply non-function

Here's the same code using layer_paths instead:

ggvis() %>%
    layer_paths(data = dfb, x= ~date_b , y= ~val_b, stroke := "red") 

So, that works, which means as long as you order your dataset by your dates your graphic should work fine by just replacing layer_lines with layer_paths.

ggvis(data = dfa, x = ~date_a, y = ~val_a) %>% 
    layer_paths(stroke := "black",  opacity := 0.5 ) %>%
    layer_paths(data = dfb, x = ~date_b , y= ~val_b, stroke := "red", 
                opacity := 0.5 ) %>% 
    scale_datetime("x", nice = "month", domain = c(as.Date("2015-06-10"), as.Date("2015-07-15") )) 

This seemed odd to me, and I have missed something. I didn't see anything in the open or closed issues on the ggvis github page and you might consider filing one.

Question:

ggvis will automatically colour my plot based on a factor column I pass it. So if my factor column was named "area" I could write this and it would execute perfectly.

names = c("Bacilli", "Actinobacteria", "area")
b_counts <- c(1,5,8,100,34,3)
a_counts <- c(1,3,11,55,67,11)
area <- c("Gut", "Skin", "Gut", "Gut", "Skin", "Oral")
rel_data <- data.frame(b_counts, a_counts, area)
names(rel_data) <- names

library(ggvis)
library(dplyr)

rel_data %>% ggvis(x = input_select(names(rel_data[,-3]), map = as.name, label = "X Axis"), 
               y = input_select(names(rel_data[,-3]), map = as.name, label = "Y Axis")) %>%
  filter(area %in% eval(input_checkboxgroup(unique(rel_data$area), selected = "Gut"))) %>%
  layer_points(fill = ~area) ### section of interest

However, if I want to pass the name of the column as a string, I can't get it to work. e.g.

region <- "area"
layer_points(fill = ~region)

I've tried as.name, eval, quote, etc but I can't seem to get anything to work. Does anyone have any ideas?


Answer:

There's a hint at properties and scales:

layer_points(prop("fill", as.name(region)))

Question:

here is the sample data-set of the plot that i have tried

x<-runif(3, min=4, max=50)
y<-runif(6, min=3, max=14)

x1 <-runif(8, min=7, max=52)
y1 <-runif(5, min=5, max=18)

i can plot smooth line using the following code.

qplot(y,x, geom='smooth', span =0.05)
qplot(y1, x1, geom='smooth', span =0.05)

but they are plotted in two separate plots; how can i plot both the smooth lines on a same plot on different layers?


Answer:

You have some problems with your example as pointed out in the comments

set.seed(1)
x <- sort(runif(20, min=4, max=50))
y <- sort(runif(20, min=3, max=14))

x1 <-sort(runif(20, min=7, max=52))
y1 <-sort(runif(20, min=5, max=18))

You can use qplot and string a bunch of layers together

library(ggplot2)
qplot(x, y) + geom_smooth(aes(x, y)) + geom_point(aes(x1, y1)) + geom_smooth(aes(x1, y1))

But it is easier to use ggplot once you have the data in the proper format

dd <- data.frame(x, x1, y, y1)
ll <- reshape(dd, dir = 'long', varying = list(1:2, 3:4))

ggplot(ll, aes(x, y, group = time)) + geom_point() + geom_smooth()

Question:

I am fairly new to R and therefore have to bother you with a basic question.

I have two large panel datasets (60 variables, each for 30 countries, ranging over a period from 1950-2013). The 60 variables have identical names, the data may or may not differ.

My final goal is to create 60 grids with 30 plots each: each grid refers to one of the 60 variables and contains a plot for each country. Each plot will contain 2 line graphs, one from the first data frame and one of the second (for the same variable each).

I have done this in Stata before, using global vars and a simple loop. I am stuck in trying to make this work in R.

I cast the data into wide format for now (columns: Date, Country, Indicator1,...Indicator60), but have read that ggplot2 does better with long formats(?).

My main issue is how to loop at all (for, lapply, function..). .

If not an answer, I would hugely appreciate ideas or hints at how to approach this problem, so that I would manage to ask more specific questions, if needed.

Edit: below a reproducible sample of the data, as requested

year <- c(2010, 2011, 2012, 2013, 2010, 2011, 2012, 2013,2010, 2011, 2012,     
    2013, 2010, 2011, 2012, 2013, 2010, 2011, 2012, 2013, 2010, 2011, 2012,    
    2013, 2010, 2011, 2012, 2013, 2010, 2011, 2012, 2013)
country <- c(rep("Australia", times =8), rep("Canada", times = 8),  
    rep("Australia", times =8), rep("Canada", times = 8))
indicator <- c(rep("Apples", times = 16), rep("Bananas", times = 16))
versiondata <- c(rep("new", times = 4), rep("old", times = 4), rep("new",  
    times = 4), rep("old", times = 4), rep("new", times = 4), rep("old", 
    times = 4), rep("new", times = 4), rep("old", times = 4))
value <- runif(32)
mydf <- data.frame(year, country, indicator, versiondata, value)  

I am still stuck at the exact expression of the do. I came up with this sorry bit, where I do not know how to specify the two y-variables (corresponding to old and new from the column versiondata).

mydf %>%
  group_by(indicator) %>%
  do({
    p <- ggplot(., aes(x=year)) + 
      geom_line(aes(y = ???)) 
    + facet_wrap(~country) + ggtitle("indicator")
    })

Answer:

A fairly standard approach for this kind of thing would be:

by(mydf, mydf$indicator, function(X) ggplot(X, aes(year, value, color = versiondata)) + geom_line() + facet_wrap(~country))

Using the indicator name as a title can be take a little more finesse:

lapply(unique(mydf$indicator), function(X) ggplot(mydf[mydf$indicator == X,], aes(year, value, color = versiondata)) + geom_line() + facet_wrap(~country) + labs(title = X))

Should look like this for each indicator:

Question:

I used to make interaction plot with ggplot2 and code is given below. Now I want to reproduce the same plot with ggvis as shown below which not the same as ggplto2 output. How can I get the same plot with ggvis?

library(ggplot2)
p <- qplot(as.factor(dose), len, data=ToothGrowth, geom = "boxplot", color = supp) + theme_bw()
p <- p + labs(x="Dose", y="Response")
p <- p + stat_summary(fun.y = mean, geom = "point", color = "blue", aes(group=supp))
p <- p + stat_summary(fun.y = mean, geom = "line", aes(group = supp))
p <- p  + theme(axis.title.x = element_text(size = 12, hjust = 0.54, vjust = 0))
p <- p  + theme(axis.title.y = element_text(size = 12, angle = 90,  vjust = 0.25))
print(p)

library(ggvis)
ggvis(data=ToothGrowth, x= ~as.factor(dose), y= ~len, fill= ~supp, stroke = ~supp) %>% 
  layer_points(shape=~supp) %>% 
  layer_lines(fillOpacity=0)


Answer:

The basic problem, when trying to implement this in ggvis, is that there is no position = dodge option like in ggplot2, and therefore the boxplots for different supp values cannot be plotted at the same x coordinate. So indexing the x axis by as.factor(dose) doesn't appear to be an option. However, what we can do is use an integer index of length equal to the number of unique dose values, and then manually offset the x position of the data to the left or right, according to the supp value:

library(ggvis)
library(dplyr)
d <- ToothGrowth
d$xpos <- as.integer(factor(d$dose)) + ifelse(d$supp == "OJ", -.2, .2)

So we can now use x = ~xpos to plot the boxplots at the right positions. The next step is to define the data holding the means used to plot the points that are connected by lines.

means <- d %>% group_by(dose, supp) %>% summarise(len = mean(len))
means$xpos <- as.integer(factor(means$dose))
means <- group_by(means, supp) # The grouping is needed for layer_paths()

The graph can now be obtained as

ggvis(d, x = ~xpos, y = ~len, stroke = ~supp) %>% 
    layer_boxplots() %>%
    layer_points(data = means, fill := "blue") %>%
    layer_paths(data = means)

Now we have the problem that the x position of the plots will be at 1, 2, 3 rather than the actual dose values. This is not very straightforward to overcome because add_axis() gives no way to re-label the axis ticks (also, we couldn't have used the actual dose values instead of 1, 2, 3 in the first place because that would have placed the boxplots at dose values 0.5 and 1 closer to each other than the ones at dose values 1 and 2). This can be overcome by a not so elegant hack, which is to add an axis for each single dose value. The function add_axis() gives a way to modify the axis properties (which includes the labels) but it will use the same label for the whole axis, since the properties apply to the whole axis. So by adding an axis for each dose value, we can manipulate the labels one by one. This looks like

ggvis(d, x = ~xpos, y = ~len, stroke = ~supp) %>% 
    layer_boxplots() %>%
    layer_points(data = means, fill := "blue") %>%
    layer_paths(data = means) %>%
    add_axis("x", title = "Dose", 
        values = c(1, 1), # For some reason values of length 1 don't work...
        properties = axis_props(labels = list(text = "0.5"))) %>%
    add_axis("x", title = "", 
        values = c(2, 2), 
        properties = axis_props(labels = list(text = "1"))) %>%
    add_axis("x", title = "", 
        values = c(3, 3), 
        properties = axis_props(labels = list(text = "2"))) %>%     
    add_axis("y", title = "Response")

Alternatively, you can use a loop for these so you don't have to type the same thing over and over

labs <- data.frame(dose = unique(d$dose))
labs$xpos <- as.integer(factor(labs$dose))

v <- ggvis(d, x = ~xpos, y = ~len, stroke = ~supp) %>% 
    layer_boxplots() %>%
    layer_points(data = means, fill := "blue") %>%
    layer_paths(data = means) %>%
    add_axis("x", title = "Dose", ticks = 0) %>%
    add_axis("y", title = "Response")

for (i in 1:nrow(labs)) {
    v <- add_axis(v, "x", title = "", values = rep(labs[i, "xpos"], 2),
        properties = axis_props(labels = list(text = as.character(labs[i, "dose"]))))
}

The final outcome looks like this

Question:

Let's say that the data.table dt looks like this:

library(data.table)
dt <- data.table(grp = c("01", "01","01", "01", "01", "01", "01", "01", "02", "02", "02",
                     "03", "03", "03",
                     "04", "04", "04", "04"),
             date = c("2012-04-18", "2012-04-19","2012-04-30", "2012-05-10", "2012-06-23", "2012-06-25", 
                      "2012-07-05", "2012-07-06", 
                      "2012-04-07", "2012-04-19", "2012-04-05",
                      "2012-04-04", "2012-04-22", "2012-04-25", 
                      "2012-05-19", "2012-06-05", "2012-06-26", "2012-06-27"))



> dt
    grp       date
 1:  01 2012-04-18
 2:  01 2012-04-19
 3:  01 2012-04-30
 4:  01 2012-05-10
 5:  01 2012-06-23
 6:  01 2012-06-25
 7:  01 2012-07-05
 8:  01 2012-07-06
 9:  02 2012-04-07
10:  02 2012-04-19
11:  02 2012-04-05
12:  03 2012-04-04
13:  03 2012-04-22
14:  03 2012-04-25
15:  04 2012-05-19
16:  04 2012-06-05
17:  04 2012-06-26
18:  04 2012-06-27

I want to create a plot for each of the groups grp highlighting the weeks for which I have records. I wanted a chart something like this:

So I tried the following but it is only putting a | on the days i have records

ggplot(dt) +
   aes(y = grp, x = as.Date(date)) +
   geom_segment(aes(yend = grp, 
                    xend = as.Date(date), 
                    color = grp), 
                size = 6,
                show.legend = FALSE) +
   geom_text(aes(label = grp), 
             nudge_x = 3,
             size = 5) +
   scale_x_date('Date', date_breaks = '7 days', expand = c(0, 2)) +
   scale_color_brewer(palette = 'Set3') +
   theme_bw() +
   theme(axis.line.y = element_blank(),
         axis.text.y = element_blank(),
         axis.ticks.y = element_blank())

Right now my plot is looking like this: How can I improve my plot to the desired one?


Answer:

Would this work for you? For each date in dt it grabs the start of the week and the end of the week that it falls in. Then it plots a line segment for each start/end week combination. Per Henrik's comment, used floor_date/ceiling_date. I need to as.Date the ceiling_date because, it rounds the week up to the 20th hour so it returns a class of POSIXct.

library(lubridate)

dt$start_week <- floor_date(as.Date(dt$date), unit = "week")
dt$end_week <- as.Date(ceiling_date(as.Date(dt$date), unit = "week"))

 ggplot(dt, aes(y = grp)) + geom_segment(aes(x = start_week, xend = end_week, yend = grp)) + 
  geom_text(aes(x = as.Date(date), label = grp), nudge_x = 3, size = 5) + 
  theme_bw()+
  theme(axis.text.x=element_text(angle=45,hjust=1,vjust=1))+ 
  scale_x_date('Date', date_breaks = '7 days', expand = c(0, 2)) + 
  scale_color_brewer(palette = 'Set3') + 
  theme(axis.line.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank())

Question:

Given a dataset with a factor column (X1) and a subtotal column (X2)

  X1 X2 
1  1  12  
2  2  200 
3  3  23  
4  4  86  
5  5  141  

I would like to create a graphic like this:

which gives x2 as a percentage of the X2 total, divided by X1.

Edit: clarity and adding dataset for reproducability


Answer:

For example

set.seed(1234)
df <- data.frame(x = 1:6)
df$y <- runif(nrow(df))
df$type <- sample(letters, nrow(df))
ggplot(df, aes(x+-.5, y, fill=type)) + 
  geom_bar(stat="identity", width=1) + 
  coord_polar(start = pi/2) + 
  scale_x_continuous(limits = c(0, nrow(df)*2)) + 
  geom_text(aes(label=scales::percent(y))) + 
  ggthemes::theme_map() + theme(legend.position = c(0,.15))

gives you

Question:

When I integrate tables and figures in a document using knitr, adding the code makes it more reproducible and interesting.

Often a combination of dplyr and ggvis can make a plot that has relatively legible code (using the magrittr pipe operator %>).

mtcars %>%
  group_by(cyl, am) %>%
  summarise( weight = mean(wt) ) %>%
  ggvis(x=~am, y=~weight, fill=~cyl) %>%
  layer_bars()

The problem is that the ggvis plot:

does not look quite as as pretty as the ggplot2 plot (I know, factoring of cyl):

However, for ggplot2 we need:

mtcars %>%
  group_by(am, cyl) %>%
  summarise( weight = mean(wt) ) %>%
  ggplot( aes(x=am, y=weight, fill=cyl) ) +
  geom_bar(stat='identity')

My problem is that this switches from %>% to + for piping. I know this is a very minor itch, but I would much prefer to use:

mtcars %>%
  group_by(am, cyl) %>%
  summarise( weight = mean(wt) ) %>%
  ggplot( aes(x=am, y=weight, fill=cyl) ) %>%
  geom_bar(stat='identity')

Is there a way to modify the behaviour of ggplot2 so that this would work?

ps. I don't like the idea of using magrittr's add() since this again make the code more complicated to read.


Answer:

Since it would be too long to expand in the comments, and based on your answer I am not sure if you tried the bit of code I provided and it didn't work or you tried previously and didn't manage

geom_barw<-function(DF,x,y,fill,stat){
   require(ggplot2)
   p<-ggplot(DF,aes_string(x=x,y=y,fill=fill)) + geom_bar(stat=stat)
   return(p)
}
library(magrittr)
library(dplyr)
library(ggplot2)

mtcars %>%
group_by(cyl, am) %>%
summarise( weight = mean(wt) ) %>%
geom_barw(x='am', y='weight', fill='cyl', stat='identity')

This works for me with: dplyr_0.4.2 ggplot2_2.1.0 magrittr_1.5

Of course geom_barw could be modified so you don't need to use the quotes anymore.

EDIT: There should be more elegant and safer way with lazy (see the lazyeval package), but a very quick adaptation would be to use substitute (as pointed by Axeman - however without the deparse part):

 geom_barw<-function(DF,x,y,fill,stat){
    require(ggplot2)

    x<-substitute(x)
    y<-substitute(y)
    fill<-substitute(fill)

    p<- ggplot(DF,aes_string(x=x,y=y,fill=fill))
    p<- p + geom_bar(stat=stat)
    return(p)
}

Question:

Pretty straightforward:

  1. This does not work

    iris %>%  
    ggvis(x= ~Sepal.Length, y = ~Sepal.Width, fill=~Sepal.Length) %>%
    layer_bars()
    
  2. This it does

    iris %>%  
    ggvis(x= ~Sepal.Length, y = ~Sepal.Width, fill=~Sepal.Length) %>% 
    layer_points()
    

Why?

I actually managed to use the fill aesthetic with another dataset that I am not sharing, but that's just to point out that the fill should definitely work in my replicable example, right?


Answer:

From ?layer_bars

If grouping var is continuous, you need to manually specify grouping

iris %>%  
  group_by(Sepal.Length) %>%
  ggvis(x= ~Sepal.Length, y = ~Sepal.Width, fill = ~Sepal.Length) %>%
  layer_bars()

Which gives:

Question:

I'm trying to layer points over a boxplot. Both the points and the boxplot come from the same data source: db_gems_spend. The only difference is how they are filtered (range of dates for boxplot and a single day for points). The end goal is to add interactivity to the graph so that I will be able to select a date and immediately see how the day compares to other days by seeing where the point lands on a particular box plot.

The problem is that the points do not currently align with the box plots.

You can see it here:

This is the code:

db_gems_spend %>%
  filter(dayofweek == "Fri") %>% # add interactivity (automate dayofweek selection)
  filter(date >= "2015-08-01") %>% # add interactivity
  ggvis(~action_type, ~count) %>%
  layer_boxplots() %>%
  add_axis("x", title = "action_type", title_offset = 50, 
           properties = axis_props(labels = list(angle = 20, align = "left", fontSize = 10))) %>%
  add_axis("y", title = "count", title_offset = 60) %>%
  add_data(db_gems_spend) %>%
  filter(date == "2015-11-04") %>% # add interactivity
  layer_points(x = ~action_type, y = ~count, fill :=  "red")

How can I get these points to align?


Answer:

db_gems_spend %>%
  ggvis(~action_type, ~(count/total_spend)) %>%
  layer_boxplots() %>%
  add_data(db_gems_spend) %>%
  layer_points(x = ~action_type, y = ~count, fill := "red", 
    prop("x", ~action_type, scale = "xcenter"))

Thanks aosmith, the solution on github was what I was looking for. It turns out ggvis will align layer_points to the left of layer_boxplots if the values are categorical and not numerical unless you specify the last line of code from above.

Question:

I'm interested in converting some graphs from ggplot to ggvis, but there is relatively little information on some of ggvis the functionality.

I have a graph of bit rates that I need plotted in log scale with nicely formatted labels.

Here's the code to do it in ggplot:

require(data.table)  # data.table_1.9.2
require(magrittr)    # magrittr_1.0.1
require(ggplot2)     # ggplot2_1.0.0
require(ggvis)       # ggvis_0.3.0.99

# Management requires nice labels on their graphs
format_si = function(unit="", ...) {
  # Returns a function that formats its input in SI-style (poswers of ten)
  # The function inserts the supplied unit
  function(x) {
    limits <- c(1e-24, 1e-21, 1e-18, 1e-15, 1e-12, 1e-9,  1e-6,  1e-3,  1e0,  1e3,  1e6,  1e9,  1e12,  1e15,  1e18, 1e21,  1e24)
    prefix <- c(" y",  " z",  " a",  " f",  " p",  " n",  " ยต",  " m",  " ",  " k", " M", " G", " T",  " P",  " E", " Z",  " Y")
    i <- findInterval(abs(x), limits)
    i <- ifelse(i==0, which(limits == 1e0), i)
    paste(format(round(x/limits[i], 1), trim=TRUE, scientific=FALSE, ...), prefix[i], unit, sep="")
  }   
}   

# Create some sample data.
data = data.table(bitrate=rgamma(200, shape = 1.5, scale = 5e6))

# Make a wonderful ggplot2 bitmap graph.
data %>%
    ggplot(aes(bitrate)) +
    geom_bar(stat="bin", binwidth=.5, aes(y=..density..), fill="#ccccff", color="black") +
    scale_x_log10(breaks=10^(4:9), labels=format_si('bit/s')(10^(4:9)), limits=c(10^4,10^9))

Trying to create the basic ggvis plot works:

# Create a non-log ggvis super awesome web graph.
data.table(bitrate=rgamma(200, shape = 1.5, scale = 5e6)) %>%
    ggvis(x = ~bitrate) %>%
    compute_bin(~bitrate) %>%
    layer_rects(x = ~xmin_, x2 = ~xmax_, y=~count_, y2=0) 

But if we simply add a log scale then boom, now the graph is blank:

# Try to add a log scale.
data.table(bitrate=rgamma(200, shape = 1.5, scale = 5e6)) %>%
    ggvis(x = ~bitrate) %>%
    compute_bin(~bitrate) %>%
    layer_rects(x = ~xmin_, x2 = ~xmax_, y=~count_, y2=0) %>%
    scale_numeric("x", trans="log") 

Is it possible to recreate the ggplot graph in ggvis? What about adding labelled breaks, or a label formatting function to ggvis axes?


Answer:

Try adding expand = 0:

scale_numeric("x", trans = "log", expand = 0)

Question:

I am trying to create a shiny app where depending on the dataset, ggvis will create a scatter plot. The app works fine at the beginning. But if I try to change the dataset to mtcars, shiny just disappears.

My ui.R -

library(ggvis)
library(shiny)
th.dat <<- rock

shinyUI(fluidPage(


  titlePanel("Reactivity"),

  sidebarLayout(
    sidebarPanel(

      selectInput("dataset", "Choose a dataset:", 
                  choices = c("rock", "mtcars")),
      selectInput("xvar", "Choose x", choices = names(th.dat), selected = names(th.dat)[1]),
      selectInput("yvar", "Choose y", choices = names(th.dat), selected = names(th.dat)[2]),
    selectInput("idvar", "Choose id", choices = names(th.dat), selected = names(th.dat)[3])

    ),


    mainPanel(
ggvisOutput("yup")



    )
  )
))

server.R -

  library(ggvis)
library(shiny)
library(datasets)

shinyServer(function(input, output, session) {

  datasetInput <- reactive({
    switch(input$dataset,
           "rock" = rock,
           "mtcars" = mtcars)

  })


  obs <- observe({
    input$dataset
    th.dat <<- datasetInput()
    s_options <- list()
    s_options <- colnames(th.dat)

    updateSelectInput(session, "xvar",
                      choices = s_options,
                      selected = s_options[[1]]
    )
    updateSelectInput(session, "yvar",
                      choices = s_options,
                      selected = s_options[[2]]
    )
    updateSelectInput(session, "idvar",
                      choices = s_options,
                      selected = s_options[[3]]
    )
  })

  xvarInput <- reactive({
    input$dataset
    input$xvar

    print("inside x reactive," )
    print(input$xvar)

    xvar <- input$xvar
  })

  yvarInput <- reactive({
    input$dataset
    input$yvar

    print("inside y reactive,")
    print(input$yvar)

    yvar <- input$yvar
  })


  dat <- reactive({

    dset <- datasetInput()
    xvar <- xvarInput()
#    print(xvar)
    yvar <- yvarInput()
#    print(yvar)

    x <- dset[, xvar]
    y <- dset[,yvar]
    df <- data.frame(x = x, y = y)
  })

  dat %>%
    ggvis(~x, ~y) %>%
    layer_points() %>%
    bind_shiny("yup")
})

I have tried many ways, but still stuck. Any help will be greatly appreciated.


Answer:

I left some pointers in the comments but it seems that ggvis evaluates everything quite early so there is a need for some test cases.

rm(list = ls())
library(shiny)
library(ggvis)

ui <- fluidPage(
  titlePanel("Reactivity"),
  sidebarPanel(
    selectInput("dataset", "Choose a dataset:", choices = c("rock", "mtcars")),
    uiOutput("xvar2"),uiOutput("yvar2"),uiOutput("idvar2")),
    mainPanel(ggvisOutput("yup"))
)

server <- (function(input, output, session) {

  dataSource <- reactive({switch(input$dataset,"rock" = rock,"mtcars" = mtcars)})

  # Dynamically create the selectInput
  output$xvar2 <- renderUI({selectInput("xvar", "Choose x",choices = names(dataSource()), selected = names(dataSource())[1])})
  output$yvar2 <- renderUI({selectInput("yvar", "Choose y",choices = names(dataSource()), selected = names(dataSource())[2])})
  output$idvar2 <- renderUI({selectInput("idvar", "Choose id",choices = names(dataSource()), selected = names(dataSource())[3])})

  my_subset_data <- reactive({        

    # Here check if the column names correspond to the dataset
    if(any(input$xvar %in% names(dataSource())) & any(input$yvar %in% names(dataSource())))
    {
      df <- subset(dataSource(), select = c(input$xvar, input$yvar))
      names(df) <- c("x","y")
      return(df)
    }
  })

  observe({
    test <- my_subset_data()
    # Test for null as ggvis will evaluate this way earlier when the my_subset_data is NULL
    if(!is.null(test)){
      test %>% ggvis(~x, ~y) %>% layer_points() %>% bind_shiny("yup")
    }
  })
})

shinyApp(ui = ui, server = server)

Output 1 for rocks Output 2 for mtcars

Question:

I'm creating a shiny application with a dropdown to select a graph. I pull in data from my working directory, create a few variables, and then select these variables in ggplot2 to display the graph I'd like. However, I'd like to add some individual customization to these graphs.I'm struggling to find a way to be able to display the information differently on each graph -e.g., have different x and y names for each. Here is what I currently have:

selection <- reactive({if(input$var=="Average Price by Country") {
  return(mpc)
} else if (input$var =="Average Price by Vintage") {
  return(mpv)
} else if (input$var == "Standard Deviation by Country") {
  return(sdc)
} 
})

output$plot <- renderPlot({

  function() {
    if(selection ==mpc) {
    selection() %>%
    ggplot(aes(V1, V2, fill = V3)) +
    geom_bar(stat = "identity", position="dodge")    
    }else {
    selection() %>%
    ggplot(aes(V1, V2, fill = V3)) +
    geom_bar(stat = "identity", position="dodge")    
    }
  }
})

Average price by Country, Average Price by Vintage and Standard deviation by Country are all options in a dropdown in the UI. Ideally, I'd like to be able to customize each ggvis graph differently. The way it's currently set up, it just displays different data with the same ggvis function. I tried to wrap selection in outputplot but this didn't even display the graph when I ran the app.

Is there any way to write a function to do this? Thank you.


Answer:

I can't see the rest of your code, so I can't test it, but this is what I would do:

output$plot <- renderPlot({
  if(input$var == "Average Price by Country") {
    selection <- mpc
    plot_type = "mpc"
  } else if (input$var =="Average Price by Vintage") {
    selection <- mpv
    plot_type = "mpv"
  } else if (input$var == "Standard Deviation by Country") {
    selection <- sdc
    plot_type = "sdc"
  } 
  if(plot_type == "mpv") { # CHANGE ME
      selection %>%
        ggplot(aes(V1, V2, fill = V3)) +
        geom_bar(stat = "identity", position="dodge")    
    }else {
      selection %>%
        ggplot(aes(V1, V2, fill = V3)) +
        geom_bar(stat = "identity", position="dodge")    
    }
}

The big assumption being that mpc, mpv and sdc are in the workspace. (ie, load("yourworkspace.RData") before shinyServer())

Question:

I have a numeric vector counts which stores a total count of a column from several tables list.

so head(counts) gives:

head(counts)
## [1] 1000 1000   40 1000 1000  624

1000 is value for table1, 1000 for table2, 40 for table 3 and so on.

and head(list) gives:

head(list)
## [1] "table1"        "table2"    "table3"
## [4] "table"         "table"     "table6"

when I do barplot(counts) I get a barplot. But I can't draw a barplot using ggvis or ggplot. For ggplot I get this error:

ggplot2 doesn't know how to deal with data of class numeric.  

so I converted it to data.frame and store it as a new variable:

newdata <- data.frame(counts,list)

when I did head(newdata) I got this:

##   counts             list
## 1   1000             table1
## 2   1000             table2
## 3     40             table3

but when I try to draw a barplot using ggvis I get an error:

ggvis(newdata, props(x = ~list, y = ~counts, y2 = 0)) +
  mark_rect(props(width := 10))
error in new_prop.default... : unknown input to pro: list(property = "x" ...)

and if I draw a ggplot ggplot(newdata, aes(x = list, y = counts)) I get a blank graph. Any idea?


Answer:

barplot with ggplot

library(ggplot2)
ggplot(newdata, aes(x = list, y = counts)) + geom_bar(stat = "identity")

barplot with ggvis

library(ggvis)
newdata %>% ggvis(~list, ~counts) %>% layer_bars()