## How can I summarizing data statistics using R

r summary statistics by group

summary statistics in r dplyr

plot summary statistics r

summary function in r

visualize descriptive statistics in r

r summary data frame

descriptive statistics in r pdf

how can I write a short script that creates a new data frame that reports the following descriptive statistics for each column of continuous data for the survey below: mean, standard deviation, median, minimum value, maximum value, sample size?

Distance Age Height Coning 1 21.4 18 3.3 Yes 2 13.9 17 3.4 Yes 3 23.9 16 2.9 Yes 4 8.7 18 3.6 No 5 241.8 6 0.7 No 6 44.5 17 1.3 Yes 7 30.0 15 2.5 Yes 8 32.3 16 1.8 Yes 9 31.4 17 5.0 No 10 32.8 13 1.6 No 11 53.3 12 2.0 No 12 54.3 6 0.9 No 13 96.3 11 2.6 No 14 133.6 4 0.6 No 15 32.1 15 2.3 No 16 57.9 12 2.4 Yes 17 30.8 17 1.8 No 18 59.9 7 0.8 No 19 42.7 15 2.0 Yes 20 20.6 18 1.7 Yes 21 62.0 8 1.3 No 22 53.1 7 1.6 No 23 28.9 16 2.2 Yes 24 177.4 5 1.1 No 25 24.8 14 1.5 Yes 26 75.3 14 2.3 Yes 27 51.6 7 1.4 No 28 36.1 9 1.1 No 29 116.1 6 1.1 No 30 28.1 16 2.5 Yes 31 8.7 19 2.2 Yes 32 105.1 6 0.8 No 33 46.0 15 3.0 Yes 34 102.6 7 1.2 No 35 15.8 15 2.2 No 36 60.0 7 1.3 No 37 96.4 13 2.6 No 38 24.2 14 1.7 No 39 14.5 15 2.4 No 40 36.6 14 1.5 No 41 65.7 5 0.6 No 42 116.3 7 1.6 No 43 113.6 8 1.0 No 44 16.7 15 4.3 Yes 45 66.0 7 1.0 No 46 60.7 7 1.0 No 47 90.6 7 0.7 No 48 91.3 7 1.3 No 49 14.4 18 3.1 Yes 50 72.8 14 3.0 Yes

You can write your own function to get such a summary into a data.frame:

# Defining the function my.summary <- function(x, na.rm=TRUE){ result <- c(Mean=mean(x, na.rm=na.rm), SD=sd(x, na.rm=na.rm), Median=median(x, na.rm=na.rm), Min=min(x, na.rm=na.rm), Max=max(x, na.rm=na.rm), N=length(x)) } # identifying numeric columns ind <- sapply(df, is.numeric) # applying the function to numeric columns only sapply(df[, ind], my.summary) Distance Age Height Mean 58.67200 11.840000 1.9160000 SD 45.48137 4.604168 0.9796626 Median 48.80000 13.500000 1.7000000 Min 8.70000 4.000000 0.6000000 Max 241.80000 19.000000 5.0000000 N 50.00000 50.000000 50.0000000

Or you can use the built-in function `basicStats`

from fBasics package for a more detailed summary:

> library(fBasics) > basicStats(df[, ind]) Distance Age Height nobs 50.000000 50.000000 50.000000 NAs 0.000000 0.000000 0.000000 Minimum 8.700000 4.000000 0.600000 Maximum 241.800000 19.000000 5.000000 1. Quartile 28.300000 7.000000 1.125000 3. Quartile 74.675000 15.750000 2.475000 Mean 58.672000 11.840000 1.916000 Median 48.800000 13.500000 1.700000 Sum 2933.600000 592.000000 95.800000 SE Mean 6.432037 0.651128 0.138545 LCL Mean 45.746337 10.531510 1.637583 UCL Mean 71.597663 13.148490 2.194417 Variance 2068.555118 21.198367 0.959739 Stdev 45.481371 4.604168 0.979663 Skewness 1.711028 -0.158853 0.905415 Kurtosis 3.753948 -1.574527 0.578684

**Summary Statistics and Graphs with R,** Summary Statistics and Graphs with R Exploratory Data Analysis. Table of Contents�. Contributing Authors: Ching-Ti Liu, PhD, Associate Professor, Biostatistics. If you need a quick overview of your dataset, you can, of course, always use the R command str() and look at the structure. But this tells you something only about the classes of your variables and the number of observations. Also, the function head() gives you, at best, an idea of the way the data is stored in the dataset.

The following use of `do.call`

, `rbind`

and `sapply`

provides a summary for each column that has the class 'numeric'. You can write your own statistics function if you need different statistics than those of `summary`

(see the answer of @Jilber).

mtcars$carb = as.factor(mtcars$carb) # Forcing one column to a factor do.call('rbind', sapply(mtcars, function(x) if(is.numeric(x)) summary(x))) Min. 1st Qu. Median Mean 3rd Qu. Max. mpg 10.400 15.420 19.200 20.0900 22.80 33.900 cyl 4.000 4.000 6.000 6.1880 8.00 8.000 disp 71.100 120.800 196.300 230.7000 326.00 472.000 hp 52.000 96.500 123.000 146.7000 180.00 335.000 drat 2.760 3.080 3.695 3.5970 3.92 4.930 wt 1.513 2.581 3.325 3.2170 3.61 5.424 qsec 14.500 16.890 17.710 17.8500 18.90 22.900 vs 0.000 0.000 0.000 0.4375 1.00 1.000 am 0.000 0.000 0.000 0.4062 1.00 1.000 gear 3.000 3.000 4.000 3.6880 4.00 5.000

**Formatted Summary Statistics and Data Summary Tables with ,** The n_perc function is the workhorse, but n_perc0 is also provided for ease of use in the same way that base R has paste and paste0 . n_perc� R provides a wide range of functions for obtaining summary statistics. One method of obtaining descriptive statistics is to use the sapply( ) function with a specified summary statistic. # get means for variables in data frame mydata

Here are some examples using `data.table`

.
I'm using the functions defined in the previous answers.

my.summary <- function(x, na.rm=TRUE){ result <- c(Mean=mean(x, na.rm=na.rm), SD=sd(x, na.rm=na.rm), Median=median(x, na.rm=na.rm), Min=min(x, na.rm=na.rm), Max=max(x, na.rm=na.rm), N=length(x)) } set.seed(123) df <- data.frame(id = 1:1000, Distance = rnorm(1000, 50, 100), Age = rnorm(1000, 50, 100), Height = rnorm(1000, 50, 100) ) df$Coning <- as.factor(ifelse(df$Distance > 0, "Yes", "No")) library(fBasics) library(data.table) DT <- data.table(df) setkey(DT, id)

Group by factor variable "Coning"

DT[,lapply(.SD,my.summary),by="Coning"]

Using **my.summary() and basicStats()**
Just numeric Variables

DT[,lapply(.SD, my.summary),, .SDcols = names(DT)[2:4]] BS <- DT[,sapply(.SD, basicStats),, .SDcols = names(DT)[2:4]] BS[, summary := znames] setnames(BS, 1:3, names(DT)[2:4]) BS DT[,lapply(.SD, summary),, .SDcols = names(DT)[2:4]]

using **summary()**
Numeric Variable using

DT[,sapply(.SD, function(x) if(is.numeric(x)) summary(x)),, .SDcols = names(DT)[2:4]]

Factor Variable

DT[,sapply(.SD, function(x) if(is.factor(x)) summary(x)),, .SDcols = names(DT)[5]]

Using the quantile function is also quite useful:

DT[,sapply(.SD, function(x) if(is.numeric(x)) quantile(x)),, .SDcols = names(DT)[2:4]]

**Descriptive Statistics,** R provides a wide range of functions for obtaining summary statistics. One method of obtaining descriptive statistics is to use the sapply( ) function with a specified summary statistic. Possible functions used in sapply include mean, sd, var, min, max, median, range, and quantile. With these new skills, learners will leave the course with the ability to use basic statistical techniques to answer their own questions about their own data, using a widely available statistical software package (R). Learners from all walks of life can use this course to better understand their data, to make valuable informed decisions.

**Summarizing Data in R (Descriptive Statistics),** This tutorial describes how to perform basic descriptive statistics using data frames in R. Weather Data. The examples in this tutorial use historic weather data � When a data set has outliers or extreme values, we summarize a typical value using the median as opposed to the mean. When a data set has outliers, variability is often summarized by a statistic called the interquartile range , which is the difference between the first and third quartiles.

**[PDF] Exploring Data and Descriptive Statistics (using R),** Exercise 1: Data from ICPSR using the Online Learning Center. • Exercise 2: R is a programming language use for statistical analysis summary(mydata). So, the question is, if you can do this in spreadsheets and databases, can you do it in R? You bet you can. In the dplyr package, you can create subtotals by combining the group_by() function and the summarise() function. Let’s start with an example. Below is the first part of the mtcars data frame that is provided in the base R package.

**Descriptive Statistics in R,** We can summarize our data in R as follows: Descriptive/Summary Statistics – With the help of descriptive statistics, we can represent the� Hot on the heels of delving into the world of R frequency table tools, it's now time to expand the scope and think about data summary functions in general. One of the first steps analysts should perform when working with a new dataset is to review its contents and shape.

##### Comments

- Some other alternatives: statmethods.net/stats/descriptives.html
- Shameless plug:
`cgwtools::mystat`

. - I've been reading hadley's book regarding programming. There is a nice idea to make this kind of function explained here. The idea is to avoid duplication in defining a function.
`summary <- function(x) { funs <- c(mean, median, sd, mad, IQR) lapply(funs, function(f) f(x, na.rm = TRUE)) }`

- @marbel, superb, thanks for that! Exactly what I was hoping to find in the answers or comments!