Moving variance in R

I know that the filter() function in R calculate the moving average. I would like to know if exists a function that return me the moving variance or standard deviation, in order to show it in a plot side by side with the output of filter() function.

Consider the zoo package. For example filter() gives:

> filter(1:100, rep(1/3,3))
Time Series:
Start = 1 
End = 100 
Frequency = 1 
  [1] NA  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
 [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
 [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
 [76] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 NA

whereas rollmean() in zoo gives:

> rollmean(1:100, k = 3, na.pad = TRUE)
  [1] NA  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
 [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
 [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
 [76] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 NA

which is the same (for a 3 point moving average in this example).

Whilst zoo doesn't have a rollsd() or rollvar() it does have rollapply(), which works like the apply() functions to apply any R function to the specified window.

> rollapply(1:100, width = 3, FUN = sd, na.pad = TRUE)
  [1] NA  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [26]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [51]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 [76]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 NA
Warning message:
In rollapply.zoo(zoo(data), ...) : na.pad argument is deprecated

or on something more interesting:

> rollapply(vec, width = 3, FUN = sd, na.pad = TRUE)
  [1]        NA 0.3655067 0.8472871 0.5660495 0.3491970 0.4732417 0.9236859
  [8] 0.8075226 1.8725851 1.1930784 0.6329325 1.1412416 0.8430772 0.5808005
 [15] 0.3838545 1.1738170 1.1655400 1.3241700 0.6876834 0.1534157 0.4858477
 [22] 0.9843506 0.6002713 0.6897541 2.0619563 2.5675788 6.3522039 6.0066864
 [29] 6.2618432 5.1704866 2.1360853 2.5602557 1.0408528 1.0316396 4.9441628
 [36] 5.0319314 5.7589716 3.2425000 4.8788158 2.0847286 4.5199291 2.5323486
 [43] 2.1987149 1.8393000 1.2278639 1.5998965 1.5341485 4.4287108 4.4159166
 [50] 4.3224546 3.6959067 4.9826264 5.3134044 8.4084322 9.1249234 7.5506725
 [57] 3.8499136 3.9680487 5.6362296 4.9124095 4.3452706 4.0227141 4.5867559
 [64] 4.7350394 4.3203807 4.4506799 7.2759499 7.6536424 7.8487654 2.0905576
 [71] 4.0056880 5.6209853 1.5551659 1.3615268 2.8469458 2.8323588 1.9848578
 [78] 1.1201124 1.4248380 1.7802571 1.4281773 2.5481935 1.8554451 1.0925410
 [85] 2.1823722 2.2788755 2.4205378 2.0733741 0.7462248 1.3873578 1.4265948
 [92] 0.7212619 0.7425993 1.0696432 2.4520585 3.0555819 3.1000885 1.0945292
 [99] 0.3726928        NA
Warning message:
In rollapply.zoo(zoo(data), ...) : na.pad argument is deprecated

You can get rid of the warning by using the fill = NA argument, as in

> rollapply(vec, width = 3, FUN = sd, fill = NA)

runsd function, Moving (aka running, rolling) Window's Standard Deviation calculated over a vector. The exponential moving average is a weighted moving average that reduces influences by applying more weight to recent data points reduction factor 2/(n+1); or r for``running", this is an exponential moving average with a reduction factor of 1/n [same as the modified average?].

The TTR package has runSD among others:

> library(TTR)
> ls("package:TTR", pattern="run*")
 [1] "runCor"    "runCov"    "runMAD"    "runMax"    "runMean"  
 [6] "runMedian" "runMin"    "runSD"     "runSum"    "runVar"

runSD will be much faster than rollapply because it avoids making many R function calls. For example:

rzoo <- function(x,n) rollapplyr(x, n, sd, fill=NA)
rttr <- function(x,n) runSD(x, n)
library(rbenchmark)
set.seed(21)
x <- rnorm(1e4)
all.equal(rzoo(x,250), rttr(x,250))
# [1] TRUE
benchmark(rzoo(x,250), rttr(x,250))[,1:6]
#           test replications elapsed relative user.self sys.self
# 2 rttr(x, 250)          100    0.58    1.000      0.58     0.00
# 1 rzoo(x, 250)          100   54.53   94.017     53.85     0.06

[R] moving mean and moving variance functions, [R] moving mean and moving variance functions. Gabor Grothendieck ggrothendieck at gmail.com. Mon Apr 4 15:17:16 CEST 2011. Previous message: [R]� The main incentive to write this set of functions was relative slowness of majority of moving window functions available in R and its packages. With the exception of runmed, a running window median function, all functions listed in "see also" section are slower than very inefficient “apply(embed(x,k),1,FUN)” approach.

rollapply in the zoo package takes an arbitrary function. It's different from filter in that it excludes the NAs by default.

That being said, though, there's not much sense in loading a package for a function that's so simple to roll yourself (pun intended).

Here's one that's right aligned:

my.rollapply <- function(vec, width, FUN) 
    sapply(seq_along(vec), 
           function(i) if (i < width) NA else FUN(vec[i:(i-width+1)]))

set.seed(1)
vec <- sample(1:50, 50)
my.rollapply(vec, 3, sd)
 [1]        NA        NA  7.094599 12.124356 16.522712 18.502252 18.193405  7.234178  8.144528
[10] 14.468356 12.489996  3.055050 20.808652 19.467922 18.009257 18.248288 15.695010  7.505553
[19] 10.066446 11.846237 17.156146  6.557439  5.291503 23.629078 22.590558 21.197484 22.810816
[28] 24.433583 19.502137 16.165808 11.503623 12.288206  9.539392 13.051181 13.527749 19.974984
[37] 19.756855 17.616280 19.347696 18.248288 15.176737  6.082763 10.000000 10.016653  4.509250
[46]  2.645751  1.527525  5.291503 10.598742  6.557439

# rollapply output for comparison
rollapply(vec, width=3, sd, fill=NA, align='right')
 [1]        NA        NA  7.094599 12.124356 16.522712 18.502252 18.193405  7.234178  8.144528
[10] 14.468356 12.489996  3.055050 20.808652 19.467922 18.009257 18.248288 15.695010  7.505553
[19] 10.066446 11.846237 17.156146  6.557439  5.291503 23.629078 22.590558 21.197484 22.810816
[28] 24.433583 19.502137 16.165808 11.503623 12.288206  9.539392 13.051181 13.527749 19.974984
[37] 19.756855 17.616280 19.347696 18.248288 15.176737  6.082763 10.000000 10.016653  4.509250
[46]  2.645751  1.527525  5.291503 10.598742  6.557439

[PDF] Package 'roll', "Updating Mean and Variance Estimates: An Improved Method. A list of objects with the rolling and expanding r-squareds for each y. the estimated population variance is 8.4 square inches, and the estimated population standard deviation is 2.92 inches (rounded off). Using R to compute standard deviation As is the case with variance, using R to compute the standard deviation is easy: You use the sd() function.

runner function in runner package applies any R function on running windows. With runner one can specify simple window by setting the length k or lag. Below moving sd as suggested by OOP on 4-elements windows.

library(runner)

set.seed(1)
x <- rnorm(20, sd = 1)
runner(x, sd, k = 4, na_pad = TRUE)

#[1]        NA        NA        NA 1.1021597 0.9967429 1.1556947 0.9884053 0.6902835 0.7180483 0.4647160
#[11] 0.7454670 0.7489618 0.9449882 1.5821988 1.4459037 1.3889432 1.3954101 0.6193867 0.5296744 0.4266423

To apply running functions on date-windows one should specify idx. Length of idx should be the same length as x and should be of type Date or integer. Example below illustrates window of size k = 4 lagged by lag = 1. In parentheses index ranges for each window.

idx <- c(4, 6, 7, 13, 17, 18, 18, 21, 27, 31, 37, 42, 44, 47, 48)
runner::runner(x = 1:15, 
               k = 5,
               lag = 1,
               idx = idx,
               f = function(x) mean(x))

# [1]   NA  1.0  1.5   NA  4.0  4.5  4.5  6.0   NA  9.0   NA 11.0 12.0 12.5 13.5

More info in documentation and vignettes

Convenience Functions, Moving Window Statistics, and Graphics, Functions for calculating moving-window statistics. The base R function range returns the minimum and maximum of a vector, but the for σ2: the MLE (n in denominator) vs. the sample variance (n−1 in denominator). The international passenger data series (G) time series data requires more robust methods such as Moving Median, Kernal Smoothing, ARIMA, or UCM (see “Unobserved Component Models using R”). Nevertheless, R offers several useful function for exponential smoothing, including some not discussed here, for instance in the QCC-Package.

Rolling variance in r, This random variable (proportion of 6s) has mean p =1/6 and variance p*(1-p)/n. 67$. The rolling regression estimator of R is defined as: Thus the weights are� In this case, because we used a moving range of length 2, the average moving range gives us an estimate of the average distance between our consecutive individual data points. A moving range of length 2 is Minitab’s default, but that can be changed by clicking the I-MR Options button in the I-MR chart dialog, and then choosing the Estimate tab:

Moving Window Statistics — GSL 2.6 documentation, The moving window variance calculates the sample variance of the values of ei = gsl_ran_gaussian(r, 0.1); gsl_vector_set(x, i, xi + ei); } /* compute moving� In a moving average regression model, a variable of interest is assumed to be a weighted moving average of unobserved independent error terms; the weights in the moving average are parameters to be estimated.

R help - moving mean and moving variance functions, moving mean and moving variance functions. Hello Lets say as an example I have a dataframe with the following attributes: rownum(1:405),� Let \(w_t \overset{iid}{\sim} N(0, \sigma^2_w)\), meaning that the w t are identically, independently distributed, each with a normal distribution having mean 0 and the same variance. The 1 st order moving average model, denoted by MA(1) is: \(x_t = \mu + w_t +\theta_1w_{t-1}\) The 2 nd order moving average model, denoted by MA(2) is:

Comments
  • Thank you Gavin, I guess this is an add-on question, but do you know of a base R solution that uses ts?
  • @Zhubarb nope, not a single function. You could code this out of the base R parts (as rollapply() has been), but there is nothing canned that I am aware of
  • Nice package, thank you! For other users, it looks like if you supply a vector of length <n this function yields an error, which means if you apply it to many groups in a dataset (e.g. different stocks, using data.table), and any of them has too little data, the whole operation will yield an error of type is outside valid range:.... Users could solve this by wrapping the function to give different behavior (like returning NAs) when too little data is supplied.
  • @MichaelOhlrogge: thanks for the comment! I like the idea of returning all NA when too little data is supplied. I'll consider how to implement something like that directly in the TTR functions (see #68).
  • nice solution. It is a little faster than rollapplyr. By some reason, in sd, in about 500 data, there is a little difference: 1e-14. Why there should be rounding(?) issues?