## Shading (or alpha) boxplots by number of datapoints with ggplot2 in R

ggplot boxplot

ggboxplot

label median on boxplot r ggplot

ggplot boxplot outline color

ggplot boxplot add number of observations

ggplot boxplot with points

r boxplot color

I have columnar data set that I am plotting a series of box plots with, most similar to the setup in this example: Boxplot of table using ggplot2

require(reshape2) ggplot(data = melt(dd), aes(x=variable, y=value)) + geom_boxplot(aes(fill=variable))

However, in my case, each of the boxplots represents a different number of data points. For example, Column A might have 8000 data points, Column B might have 6000, Column C might have 2500, and Column D might have 800.

To help communicate this, I thought I could alpha the fill color of the box to reflect the number of datapoints. The darker the box, the more datapoints were used in computing the statistics the boxplot represents.

In the ggplot2 help file for geom_histogram, they use aes(fill=..count..) to shade the bins corresponding to the # of counts in the bin.

m <- ggplot(movies, aes(x=rating)) m + geom_histogram(aes(fill=..count..))

(Wanted to include a picture of the example histogram here, but can't because I don't have enough reputation points...sorry)

I tried using this with my ggplot geom_boxplot, but it doesn't seem to know the ..count.. part. Here is my line that is generating the boxplot:

ggplot(meltedData, aes(x=variable, y=value)) + geom_boxplot(aes(fill=variable), outlier.size = 1) + ylim(-4,3)

Anyone have any pointers? I know I can add the "alpha" property to geom_boxplot, but how can I apply it to each boxplot individually based on the # of datapoints in the boxplot?

Thanks in advance.

`stat_boxplot`

doesn't calculate the count. Just do it outside of `ggplot2`

:

library(plyr) DF <- ddply(mtcars, .(cyl), transform, myalpha = length(cyl)) library(ggplot2) ggplot(DF, aes(factor(cyl), mpg)) + geom_boxplot(aes(alpha = myalpha), fill = "blue")

**A box and whiskers plot (in the style of Tukey ,** Source: R/geom-boxplot.r , R/stat-boxplot.r outlier.colour, outlier.color, outlier. fill, outlier.shape, outlier.size, outlier.stroke, outlier.alpha to hide the outliers, for example when overlaying the raw data points on top of the boxplot. If TRUE , boxes are drawn with widths proportional to the square-roots of the number of� gg + stat_summary(aes(group = 1), fun.data = mean_se, geom = "ribbon", fill = "pink", alpha = 0.6) The ribbon doesn't make a lot of sense across a discrete variable, but here's an example with some dummy data for a continuous group, where this setup becomes more reasonable (though IMO still hard to read).

`data.table`

option:

dd <- data.table(dd) dd[,Count:=.N,by=variable]

**Boxplot with individual data points – the R Graph Gallery,** This post explains how to build a boxplot with ggplot2, adding individual data fill=name)) + geom_boxplot() + scale_fill_viridis(discrete = TRUE, alpha=0.6) +� Easy to do it with base R too. – asac Dec 10 '15 at 14:22. Shading (or alpha) boxplots by number of datapoints with ggplot2 in R. 11. R ggplot2 boxplot

My version of Roland's solution using `dplyr`

package:

library(dplyr) library(ggplot2) df <- mtcars %>% group_by(cyl) %>% mutate(my_alpha = length(cyl)) ggplot(df, aes(factor(cyl), mpg)) + geom_boxplot(aes(alpha = my_alpha), fill = 'blue')

**Ggplot2 boxplot with variable width – the R Graph Gallery,** One way to tackle this issue is to build boxplot with width proportionnal to sample size. Since individual data points are hidden, it is also impossible to know what xlab with the number of obs for each group my_xlab <- paste(levels(data$ names) y=value, fill=names)) + geom_boxplot(varwidth = TRUE, alpha=0.2) +� # Black boxplots with 3 different alpha levels scale_colour_manual(values = hcl(0,0,0, alpha=avals)) # Colored boxplots with 3 different alpha levels scale_colour_manual(values=hcl(seq(15,375,length.out=4)[1:3], 100, 65, alpha=avals)) The hcl function returns the hexdecimal code for each color, in effect taking care of the conversion for you.

**How To Make Grouped Boxplots with ggplot2?,** In R, ggplot2 package offers multiple options to visualize such grouped boxplots. geom_boxplot () + geom_jitter (width=0.1,alpha=0.2) showing the boxplot and geom_jitter for showing the data points with jitter. The key idea to make a grouped boxplot is to use fill argument inside ggplot's aesthetics. controlling order of points in ggplot2 in R? 9. Plotting two boxplots at one x position using R and ggplot2. 3. Shading (or alpha) boxplots by number of datapoints

**How to Make Boxplot in R with ggplot2?,** One of many strengths of R is the tidyverse packages and the ability to make great looking plots easily. Let us learn how to make boxplot using ggplot in R and see a few We can use fill argument inside aes() function to color the plot. and specify transparency of data points with the argument alpha. with ggplot2 Cheat Sheet label, alpha, angle, color, family, fontface, hjust, lineheight, size, vjust Three Variables m + geom_contour(aes(z = z))

**Be Awesome in ggplot2: A Practical Guide to be Highly Effective,** ggplot2 is a powerful and a flexible R package, implemented by Hadley Wickham , for Many examples of code and graphics are provided. geom= "boxplot", fill = sex) # Violin plot qplot(sex, weight, data = wdata, geom = "violin") To customize the plot, the following arguments can be used: alpha, color, fill, linetype, size. You want to use colors in a graph with ggplot2. Solution. The default colors in ggplot2 can be difficult to distinguish from one another because they have equal luminance. They are also not friendly for colorblind viewers. A good general-purpose solution is to just use the colorblind-friendly palette below. Sample data

##### Comments

- could you please provide a reproducible example of the columns you're trying to plot?
- I don't know the whole
`..count..`

system very well, but I think it works with histograms because of the`stat="bin"`

argument. You may have to just add`count`

to the data itself. - Sure. What do you mean by "at least"?
- I just don't see the need to list all possibilities to do this everytime split-apply-combine is needed in an answer. We really need a good FAQ giving all possibilities. I chose
`plyr`

here because I was already in the hadleyverse.