## Simulate 5000 samples of size 5 from a normal distribution with mean 5 and standard deviation 3

I am trying to simulate 5000 samples of size 5 from a normal distribution with mean 5 and standard deviation 3. I want to then compute the mean of each sample and make a histogram of the sample means

My current code is not giving me an error but I don't think it's right:

nrSamples = 5000 e <- list(mode="vector",length=nrSamples) for (i in 1:nrSamples) { e[[i]] <- rnorm(n = 5, mean = 5, sd = 3) } sample_means <- matrix(NA, 5000,1) for (i in 1:5000){ sample_means[i] <- mean(e[[i]]) }

Any idea on how to tackle this? I am very very new to R!

Your code is fine (see below), but I would suggest you try the following:

yourlist <- lapply(1:nrSamples, function(x) rnorm(n=5, mean = 5, sd = 3 )) yourmeans <- sapply(yourlist, mean)

Here, for each element of the sequence 1, 2, 3, ... `nrSamples`

that I supply as the first argument, `lapply`

executes an function with the given element of the sequence as argument (i.e. `x`

). The function that I have supplied does not depend on `x`

, however, so it is just replicated 5000 times, and the output is stored in a list (this is what `lapply`

does). It is an easy way to avoid loops in situations like these. Needless to say, you could also just run

yourmeans <- sapply(1:nrSamples, function(x) mean(rnorm(n=5, mean = 5, sd = 3)))

Apart from the means, the latter does not store your results though, which may not be what you want. Also note that I call `sapply`

to return a vector, which you can then use to plot your histogram, using e.g. `hist(yourmeans)`

.

**To show that your code is fine, consider the following:**

set.seed(42) nrSamples = 5000 e <- list(mode="vector",length=nrSamples) for (i in 1:nrSamples) { e[[i]] <- rnorm(n = 5, mean = 5, sd = 3) } sample_means <- matrix(NA, 5000,1) for (i in 1:5000){ sample_means[i] <- mean(e[[i]]) } set.seed(42) yourlist <- lapply(1:nrSamples, function(x) rnorm(n=5, mean = 5, sd = 3 )) yourmeans <- sapply(yourlist, mean) all.equal(as.vector(sample_means), yourmeans) [1] TRUE

Here, I set the seed to the random number generator to make sure that the random numbers are the same. As you see, your code works fine, though as others have pointed out, loops can easily be avoided.

**Data Wrangling in R: Generating/Simulating data,** Using 1:6 and size=1, we can simulate the roll of a die: sample(1:6, size=1) [1] 4 9 4 3 2 4 11 12 6 2 2 3 8 11 10 11 11 10 2 7 5 8 12 ## [24] 4 6 4 8 6 11 3 10 10 8 2 9 3 lectures is the rnorm() function which generates data from a # Normal distribution. 10 random draws from N(100,5) rnorm(n = 10, mean = 100, sd = 5) The randn function returns a sample of random numbers from a normal distribution with mean 0 and variance 1. The general theory of random variables states that if x is a random variable whose mean is μ x and variance is σ x 2, then the random variable, y, defined by y = a x + b, where a and b are constants, has mean μ y = a μ x + b and

You don't need a list in this case. It is a common mistake of new R users to use lists excessively.

observations <- matrix(rnorm(25000, mean=5, sd=3), 5000, 5) means <- rowMeans(observations)

Now `means`

is a vector of 5000 elements.

**Probabilities and Distributions,** sd(norm). ## [1] 0.884. If we want to obtain a sample of values drawn from a normal Let's draw a sample of size 100 from a normal distribution with mean 2 and set.seed(124) norm <- rnorm(100, 2, 5) norm[1:10] [1] 1 2 3 2 2 2 3 3 6 2. We see that the simulated sampling distribution of the standardized average tends to deviate strongly from the standard normal distribution if the sample size is small, e.g., for \(n=5\) and \(n=10\). However as \(n\) grows, the histograms approach the standard normal distribution.

You can actually do this without for loops. `replicate`

can be used to create the 5000 samples. Then use `sapply`

to return the mean of each sample. Wrap the `sapply`

call in `hist()`

to get the histogram of means.

dat = replicate(5000, rnorm(5,5,3), simplify=FALSE) hist(sapply(dat, mean))

Or, if you want to save the means:

sample.means = sapply(dat,mean) hist(sample.means)

I think your code is giving valid results. `list(mode="vector",length=nrSamples)`

isn't doing what I think you intended (run it in the console and see what happens), but it works out because the first two list elements get overwritten in the loop.

Although there's no need to use loops here, just for illustration here are two modified versions of your code using loops:

# 1. Store random samples in a list e <- vector("list", nrSamples) for (i in 1:nrSamples) { e[[i]] <- rnorm(n = 5, mean = 5, sd = 3) } sample_means = rep(NA, nrSamples) for (i in 1:nrSamples){ sample_means[i] <- mean(e[[i]]) } # 2. Store random samples in a matrix e <- matrix(rep(NA, 5000*5), nrow=5) for (i in 1:nrSamples) { e[,i] <- rnorm(n = 5, mean = 5, sd = 3) } sample_means = rep(NA, nrSamples) for (i in 1:nrSamples){ sample_means[i] <- mean(e[, i]) }

**Simulating some simple distributions using R,** For any distribution with finite mean and standard deviation, samples taken a normal distribution around the mean of the population as sample size increases. round converts to integers, add .5 for equal intervals par(mfrow=c(2,1)) #stack but is peaked at the middle (hint, how many ways can you get a 2, a 3, a 7, . the red line is standard normal distribution line, the mean of the blue distribution is 0 with a unit standard deviation. One of the reasons we standardize the sample mean is the complexity of a

**Random Numbers from Normal Distribution with Specific Mean and ,** This example shows how to create an array of random floating-point numbers that are drawn from a random values drawn from a normal distribution with a mean of 500 and a standard deviation of 5. stats = 1�3 499.8368 4.9948 24.9483. For the Normal Distribution Simulation, Mu is initially set at 100 and Sigma is initially set at 15, but the user can change these values. For the Uniform Distribution Simulation, values range from 1 to 6, with a Mu of 3.5 and Sigma is 1.44

**[PDF] 5.7 Appendix: Using R for Sampling Distributions,** bution of the sample variance for normal data. This is similar in spirit to 3 0.3. 5 0.1. By using the methods from Sections 3.6 and 3.7, we find that E(X)=2.0 and Var(X)=1.8. Now let us use simulation to generate 500 values from this distribution. ( Recall that rnorm requires the standard deviation, not the variance.) The third� numpy.random.normal¶ numpy.random.normal (loc=0.0, scale=1.0, size=None) ¶ Draw random samples from a normal (Gaussian) distribution. The probability density function of the normal distribution, first derived by De Moivre and 200 years later by both Gauss and Laplace independently , is often called the bell curve because of its characteristic shape (see the example below).

If the size of any dimension is 0, then X is an empty array. If the size of any dimension is negative, then it is treated as 0. Beyond the second dimension, randn ignores trailing dimensions with a size of 1. For example, randn(3,1,1,1) produces a 3-by-1 vector of random numbers.

##### Comments

`dat = replicate(5000, rnorm(5,5,3), simplify=FALSE); hist(sapply(dat, mean))`

- I really appreciate the help!
- Thank you! @eipi10