## Coloring boxplot outlier points in ggplot2?

ggplot boxplot remove outliers

ggplot boxplot whiskers

boxplot outliers r

ggplot shapes

ggplot boxplot by group

geom_boxplot(outlier color)

geom_boxplot(width)

How can I color the outlier points in ggplot2? I want them to be the same color as the boxplot itself. `colour=`

is not enough to do this.

Example:

p <- ggplot(mtcars, aes(factor(cyl), mpg)) p + geom_boxplot(aes(colour=factor(cyl)))

I want to color the outliers by `factor(cyl)`

as well. This does not work:

> p <- ggplot(mtcars, aes(factor(cyl), mpg)) > p + geom_boxplot(aes(colour=factor(cyl), outlier.colour=factor(cyl)))

In order to color the outlier points the same as your boxplots, you're going to need to calculate the outliers and plot them separately. As far as I know, the built-in option for coloring outliers colors all outliers the same color.

**The help file example**

Using the same data as the 'geom_boxplot' help file:

ggplot(mtcars, aes(x=factor(cyl), y=mpg, col=factor(cyl))) + geom_boxplot()

**Coloring the outlier points**

Now there may be a more streamlined way to do this, but I prefer to calculate things by hand, so I don't have to guess what's going on under the hood. Using the 'plyr' package, we can quickly get the upper and lower limits for using the default (Tukey) method for determining an outlier, which is any point outside the range [Q1 - 1.5 * IQR, Q3 + 1.5 * IQR]. Q1 and Q3 are the 1/4 and 3/4 quantiles of the data, and IQR = Q3 - Q1. We could write this all as one huge statement, but since the 'plyr' package's 'mutate' function will allow us to reference newly-created columns, we might as well split it up for easier reading/debugging, like so:

library(plyr) plot_Data <- ddply(mtcars, .(cyl), mutate, Q1=quantile(mpg, 1/4), Q3=quantile(mpg, 3/4), IQR=Q3-Q1, upper.limit=Q3+1.5*IQR, lower.limit=Q1-1.5*IQR)

We use the 'ddply' function, because we are inputting a data frame and wanting a data frame as output ("d->d" ply). The 'mutate' function in the above 'ddply' statement is preserving the original data frame and adding additional columns, and the specification of `.(cyl)`

is telling the functions to be calculated for each grouping of 'cyl' values.

At this point, we can now plot the boxplot and then overwrite the outliers with new, colored points.

ggplot() + geom_boxplot(data=plot_Data, aes(x=factor(cyl), y=mpg, col=factor(cyl))) + geom_point(data=plot_Data[plot_Data$mpg > plot_Data$upper.limit | plot_Data$mpg < plot_Data$lower.limit,], aes(x=factor(cyl), y=mpg, col=factor(cyl)))

What we are doing in the code is to specify an empty 'ggplot' layer and then adding the boxplot and point geometries using independent data. The boxplot geometry could use the original data frame, but I am using our new 'plot_Data' to be consistent. The point geometry is then only plotting the outlier points, using our new 'lower.limit' and 'upper.limit' columns to determine outlier status. Since we use the same specification for the 'x' and 'col' aesthetic arguments, the colors are magically matched between the boxplots and the corresponding outlier points.

**Update**: The OP requested a more complete explanation of the 'ddply' function used in this code. Here it is:

The 'plyr' family of functions are basically a way of subsetting data and performing a function on each subset of the data. In this particular case, we have the statement:

ddply(mtcars, .(cyl), mutate, Q1=quantile(mpg, 1/4), Q3=quantile(mpg, 3/4), IQR=Q3-Q1, upper.limit=Q3+1.5*IQR, lower.limit=Q1-1.5*IQR)

Let's break this down in the order the statement would be written. First, the selection of the 'ddply' function. We want to calculate the lower and upper limits for each value of 'cyl' in the 'mtcars' data. We could write a 'for' loop or other statement to calculate these values, but then we would have to write another logic block later to assess outlier status. Instead, we want to use 'ddply' to calculate the lower and upper limits and add those values to every line. We choose 'ddply' (as opposed to 'dlply', 'd_ply', etc.), because we are inputting a data frame and wanting a data frame as output. This gives us:

ddply(

We want to perform the statement on the 'mtcars' data frame, so we add that.

ddply(mtcars,

Now, we want to perform our calculations using the 'cyl' values as a grouping variable. We use the 'plyr' function `.()`

to refer to the variable itself rather than to the variable's value, like so:

ddply(mtcars, .(cyl),

The next argument specifies the function to apply to every group. We want our calculation to add new rows to the old data, so we choose the 'mutate' function. This preserves the old data and adds the new calculations as new columns. This is in contrast to other functions like 'summarize', which removes all of the old columns except the grouping varaible(s).

ddply(mtcars, .(cyl), mutate,

The final series of arguments are all of the new columns of data we want to create. We define these by specifying a name (unquoted) and an expression. First, we create the 'Q1' column.

ddply(mtcars, .(cyl), mutate, Q1=quantile(mpg, 1/4),

The 'Q3' column is calculated similarly.

ddply(mtcars, .(cyl), mutate, Q1=quantile(mpg, 1/4), Q3=quantile(mpg, 3/4),

Luckily, with the 'mutate' function, we can use newly created columns as part of the definition of other columns. This saves us from having to write one giant function or from having to run multiple functions. We need to use 'Q1' and 'Q3' in the calculation of the inter-quartile range for the 'IQR' variable, and that's easy with the 'mutate' function.

ddply(mtcars, .(cyl), mutate, Q1=quantile(mpg, 1/4), Q3=quantile(mpg, 3/4), IQR=Q3-Q1,

We're finally where we want to be now. We technically don't need the 'Q1', 'Q3', and 'IQR' columns, but it does make our lower limit and upper limit equations a lot easier to read and debug. We can write our expression just like the theoretical formula: `limits=+/- 1.5 * IQR`

ddply(mtcars, .(cyl), mutate, Q1=quantile(mpg, 1/4), Q3=quantile(mpg, 3/4), IQR=Q3-Q1, upper.limit=Q3+1.5*IQR, lower.limit=Q1-1.5*IQR)

Cutting out the middle columns for readability, this is what the new data frame looks like:

plot_Data[, c(-3:-11)] # mpg cyl Q1 Q3 IQR upper.limit lower.limit # 1 22.8 4 22.80 30.40 7.60 41.800 11.400 # 2 24.4 4 22.80 30.40 7.60 41.800 11.400 # 3 22.8 4 22.80 30.40 7.60 41.800 11.400 # 4 32.4 4 22.80 30.40 7.60 41.800 11.400 # 5 30.4 4 22.80 30.40 7.60 41.800 11.400 # 6 33.9 4 22.80 30.40 7.60 41.800 11.400 # 7 21.5 4 22.80 30.40 7.60 41.800 11.400 # 8 27.3 4 22.80 30.40 7.60 41.800 11.400 # 9 26.0 4 22.80 30.40 7.60 41.800 11.400 # 10 30.4 4 22.80 30.40 7.60 41.800 11.400 # 11 21.4 4 22.80 30.40 7.60 41.800 11.400 # 12 21.0 6 18.65 21.00 2.35 24.525 15.125 # 13 21.0 6 18.65 21.00 2.35 24.525 15.125 # 14 21.4 6 18.65 21.00 2.35 24.525 15.125 # 15 18.1 6 18.65 21.00 2.35 24.525 15.125 # 16 19.2 6 18.65 21.00 2.35 24.525 15.125 # 17 17.8 6 18.65 21.00 2.35 24.525 15.125 # 18 19.7 6 18.65 21.00 2.35 24.525 15.125 # 19 18.7 8 14.40 16.25 1.85 19.025 11.625 # 20 14.3 8 14.40 16.25 1.85 19.025 11.625 # 21 16.4 8 14.40 16.25 1.85 19.025 11.625 # 22 17.3 8 14.40 16.25 1.85 19.025 11.625 # 23 15.2 8 14.40 16.25 1.85 19.025 11.625 # 24 10.4 8 14.40 16.25 1.85 19.025 11.625 # 25 10.4 8 14.40 16.25 1.85 19.025 11.625 # 26 14.7 8 14.40 16.25 1.85 19.025 11.625 # 27 15.5 8 14.40 16.25 1.85 19.025 11.625 # 28 15.2 8 14.40 16.25 1.85 19.025 11.625 # 29 13.3 8 14.40 16.25 1.85 19.025 11.625 # 30 19.2 8 14.40 16.25 1.85 19.025 11.625 # 31 15.8 8 14.40 16.25 1.85 19.025 11.625 # 32 15.0 8 14.40 16.25 1.85 19.025 11.625

Just to give a contrast, if we were to do the same 'ddply' statement with the 'summarize' function, instead, we would have all of the same answers but without the columns of the other data.

ddply(mtcars, .(cyl), summarize, Q1=quantile(mpg, 1/4), Q3=quantile(mpg, 3/4), IQR=Q3-Q1, upper.limit=Q3+1.5*IQR, lower.limit=Q1-1.5*IQR) # cyl Q1 Q3 IQR upper.limit lower.limit # 1 4 22.80 30.40 7.60 41.800 11.400 # 2 6 18.65 21.00 2.35 24.525 15.125 # 3 8 14.40 16.25 1.85 19.025 11.625

**outlier.colour in geom_boxplot · Issue #1400 · tidyverse/ggplot2 ,** It would be great if it was possible to make outliers automatically inherit the colours of the boxplot as an alternative to the default: outlier.colour As @koshke said, having the outliers colored like the lines of the box (not the fill color) is now easily possible by setting outlier.colour = NULL: m <- ggplot(movies, aes(y = votes, x = factor(round(rating)), colour = factor(Animation))) m + geom_boxplot(outlier.colour = NULL) + scale_y_log10() outlier.colour must be written with "ou"

**A box and whiskers plot (in the style of Tukey ,** outlier.colour, outlier.shape, outlier.size : The color, the shape and the size for outlying points; notch : logical value. If TRUE, make a notched box plot. The notch When there are too many outliers, to avoid overplotting, you can change the size, shape and color of the outlier points with outlier.size, outlier.shape and outlier.color arguments. By default, the size of the outlier points is 2, shape is 16 and color is black.

I found a solution to the fact that setting `geom_boxplot(outlier.colour = NULL)`

doesn't work anymore in newest versions of R (@hamy speaks about version 1.0.0 of ggplot2).

In order to replicate the behaviour that @cbeleites proposed you simply need to use the following code:

update_geom_defaults("point", list(colour = NULL)) m <- ggplot(movies, aes(y = votes, x = factor(round(rating)), colour = factor(Animation))) m + geom_boxplot() + scale_y_log10()

as expected this produces plot with points that match the line color.

Of course one should remember to restore the default if he needs to draw multiple plots:

update_geom_defaults("point", list(colour = "black"))

The solution was found by reading the ggplot2 changelog on github:

The outliers of

`geom_boxplot()`

use the default colour, size and shape from`geom_point()`

. Changing the defaults of`geom_point()`

with`update_geom_defaults()`

will apply the same changes to the outliers of`geom_boxplot()`

. Changing the defaults for the outliers was previously not possible. (@ThierryO, #757)

*Posted here as well: ggplot2 boxplot, how do i match the outliers' color to fill aesthetics?*

**ggplot2 box plot : Quick start guide - R software and data ,** I have been trying to get my outlier point colors to match the fill color of my boxes in a ggplot2 boxplot. I found the update_geom_defaults Use # outlier.colour to override p + geom_boxplot(outlier.colour = "red", outlier.shape = 1) # Remove outliers when overlaying boxplot with original data points p + geom_boxplot(outlier.shape = NA) + geom_jitter(width = 0.2) # Boxplots are automatically dodged when any aesthetic is a factor p + geom_boxplot(aes(colour = drv)) # You can also use boxplots with continuous x, as long as you supply # a grouping variable. cut_width is particularly useful ggplot(diamonds, aes(carat, price)) + geom

If there is a need to change shape or color of the outlier points according to different factor (not the same which is used for making boxplot groups) then answer of @Dinre can be adapted.

Color of points can be changed only if the color isn't used for boxplot themselves (you can't use two variables for colors).

Using the data `plot_Data`

and code from the @Dinre answer - color of outliers depend on factor `carb`

. By adding argument `outlier.shape = NA`

to `geom_boxplot()`

original outliers are removed to ensure that they are not over-plotted by `geom_point()`

.

ggplot() + geom_boxplot(data=plot_Data, aes(x=factor(cyl), y=mpg),outlier.shape = NA) + geom_point(data=plot_Data[plot_Data$mpg > plot_Data$upper.limit | plot_Data$mpg < plot_Data$lower.limit,], aes(x=factor(cyl), y=mpg, color=factor(carb)))

To change the shape of points:

ggplot() + geom_boxplot(data=plot_Data, aes(x=factor(cyl), y=mpg),outlier.shape = NA) + geom_point(data=plot_Data[plot_Data$mpg > plot_Data$upper.limit | plot_Data$mpg < plot_Data$lower.limit,], aes(x=factor(cyl), y=mpg, shape=factor(carb)))

**Can outlier points be colored the same as box fill color?,** Learn to create Box-whisker Plot in R with ggplot2, horizontal, notched, grouped box plots, add mean markers, change color and theme, Coloring a Box Plot By default, the size of the outlier points is 2, shape is 16 and color is black. Here is my code to create my boxplot. require(ggplot2) ggplot(seabattle, aes(x=PortugesOutcome,y=RatioPort2Dutch ),xlim="OutCome", y="Ratio of Portuguese to Dutch/British ships") + geom_boxplot(outlier.size=2,outlier.colour="green") + stat_summary(fun.y="mean", geom = "point", shape=23, size =3, fill="pink") + ggtitle("Portugese Sea Battles")

The outliers automatically inherits the colours from the box again in ggplot2 * 1.0.1.9003.

https://github.com/hadley/ggplot2/issues/1400

```{r} library(ggplot2) point_size=10 ggplot(mtcars, aes(x=factor(cyl), y=mpg, col=factor(cyl))) + geom_boxplot(outlier.size = point_size) ```

Boxplot

**R Box-whisker Plot - ggplot2,** See boxplot.stats for for more information on how hinge positions are calculated to use for overlapping points on this layer; outlier.colour: colour for outlying points. p + geom_boxplot() qplot(factor(cyl), mpg, data = mtcars, geom = "boxplot"). Added a vector to your data set to indicate which points are and are not outliers. Then, Set the geom_boxplot to not plot any outliers and use a geom_point to plot the outliers explicity. I will use the diamonds data set from ggplot2 to illustrate.

**geom_boxplot function,** outlier.colour: colour for outlying points; outlier.shape: shape of outlying points p + geom_boxplot() qplot(factor(cyl), mpg, data = mtcars, geom = "boxplot") p + This R tutorial describes how to create a box plot using R software and ggplot2 package. The function geom_boxplot () is used. A simplified format is : geom_boxplot(outlier.colour="black", outlier.shape=16, outlier.size=2, notch=FALSE)

**geom_boxplot,** outlier.colour, colour for outlying points outlier.size, size of outlying points p + geom_boxplot() qplot(factor(cyl), mpg, data = mtcars, geom = "boxplot") p + You can make the outliers invisible with the argument outlier.colour = NA: geom_boxplot(aes(color = factor(ID1)), outlier.colour = NA)

Creating plots in R using ggplot2 - part 10: boxplots written April 18, 2016 in r,ggplot2,r graphing tutorials written April 18, 2016 in r , ggplot2 , r graphing tutorials This is the tenth tutorial in a series on using ggplot2 I am creating with Mauricio Vargas Sepúlveda .

##### Comments

- Can you post a code sample for people to work from? That will encourage more useful answers.
- @Dinre: good point, added example
- Thank you it's helpful but very complex... isn't it possible to just fetch the colors of the
`factor(cyl)`

and handfeed a vector to`outlier.colour`

? I.e. to tell ggplot explicitly what color each of the outliers should be rather than computing the outlier points? - @user248237dfsf As far as I know, the
`outlier.colour=`

argument will not allow for a vector of colors. You're trying to do something outside of the 'geom_boxplot' function's expected behavior, so it cannot be done from within the function. Besides, a couple of lines of code isn't exactly what I would call "very complex." More complex than a single argument? Of course. More complex than writing a new boxplot function? Not by a long shot. - Complex meaning it requires you to repeat the calculation of outliers which is normally done by boxplot, and introduces the possibility that you'll calculate outliers slightly differently than the boxplot, etc. I would in this case prefer a method that just sets the color object for each boxplot manually but if that's not possible I guess this is the only way
- @user248237dfsf It's true that we are calculating it again, but the method does come straight from the 'geom_boxplot' help file. It should be exact. The only way to know for sure is to manually feed the numbers to both geometries, which is what you should be doing anyway if you want to have a truly reproducible example. Nothing that relies on function defaults is really reproducible, since the defaults could be changed in an update.
- Thanks. I still don't quite get
`"plot_Data <- ddply(mtcars, .(cyl), mutate, Q1=quantile(mpg, 1/4), Q3=quantile(mpg, 3/4), IQR=Q3-Q1, upper.limit=Q3+1.5*IQR, lower.limit=Q1-1.5*IQR)"`

- if you could explain more what`.(cyl)`

is exactly and how`mutate`

works that would be appreciated. I find`ddply`

pretty cryptic - What version of ggplot2 were you using? With 1.0.0 this does not produce colored outliers
- @Hamy: I just updated to 1.0.0 (from 0.9.3) and can confirm the problem. A quick look into the help did not lead to a solution. Maybe you could ask the ggplot2 developers what to do?
- @cbeleites See the newest answer by tarch below for the correct solution.
`NULL`

is the default for outlier color, and it inherits from the default point color, so you have to set that instead. I'm afraid I have to downvote the post so that the now correct one has a better chance of floating to the top. I know your answer was correct at some point in time, but it's not anymore, unfortunately. - I can confirm this works for newer versions—this should be the topmost answer.
- works for me too, also with the AE spelling
`color="black"`

. - Works for me! Thanks!
- thanks but I am trying to color it by a variable condition. This does not seem to work:
`> p <- ggplot(mtcars, aes(factor(cyl), mpg)) > p + geom_boxplot(aes(outlier.colour = factor(cyl)), outlier.size = 3)`