In R, what is a good way to aggregate String data

aggregate data frame r
r aggregate multiple columns
r aggregate count
r aggregate multiple functions
aggregate categorical data in r
r aggregate mean by group
aggregate in r dplyr
r aggregate function multiple arguments

In R (or S-PLUS), what is a good way to aggregate String data in a data frame?

Consider the following:

myList <- as.data.frame(c("Bob", "Mary", "Bob", "Bob", "Joe"))

I would like the output to be:

 [Bob,  3
  Mary, 1
  Joe,  1]

Currently, the only way I know how to do this is with the summary function.

> summary(as.data.frame(myList))

 Bob :3                                
 Joe :1                                
 Mary:1      

This feels like a hack. Can anyone suggest a better way?

Using table, no need to sort:

ctable <- table(myList);
counts <- data.frame(Name = names(ctable),Count = as.vector(ctable));

Aggregation and Restructuring data (from "R in Action"), defined by the by input parameter. The by parameter has to be a list . The data that we want to aggregate; The variable to group by within the data; The calculation to apply to the groups (what you want to find out) Example data. The raw data shown below consists of one row per case. Each case is an employee at a restaurant.

This is a combination of the above answers (as suggested by Thierry)

data.frame(table(myList[,1]))

which gives you

  Var1 Freq
1  Bob    3
2  Joe    1
3 Mary    1

Aggregate – A Powerful Tool for Data Frame in R, . a function to compute the summary statistics which can be applied to all data subsets. simplify. a logical indicating whether results should be simplified to a vector or matrix if possible. Aggregate is a function in base R which can, as the name suggests, aggregate the inputted data.frame d.f by applying a function specified by the FUN parameter to each column of sub-data.frames defined by the by input parameter. The by parameter has to be a list.

Do you mean like this?

myList <- c("Bob", "Mary", "Bob", "Bob", "Joe")
r <- rle(sort(myList))
result <- as.data.frame(cbind(r$values, r$lengths))
names(result) <- c("Name", "Occurrences")
result
  Name Occurrences
1  Bob           3
2  Joe           1
3 Mary           1

Aggregate() Function in R, frame will not work, nor will column references such as s.d.f[,1] . Basic Examples. The most basic uses of aggregate involve base functions such  Protecting an object by enclosing it in I() in a call to data.frame inhibits the conversion of character vectors to factors and the dropping of names, and ensures that matrices are inserted as single columns. I can also be used to protect objects which are to be added to a data frame, or converted to a data frame via as.data.frame.

Using data.table

myList <- data.frame(v1=c("Bob", "Mary", "Bob", "Bob", "Joe"))
library(data.table)
     v1 N
1:  Bob 3
2: Mary 1
3:  Joe 1

Compute Summary Statistics of Data Subsets, The data that we want to aggregate; The variable to group by within the data; The calculation to apply to the groups (what This produces a table of the average salary and age by role, as below. Other aggregation functions. R provides a number of powerful methods for aggregating and reshaping data. When you aggregate data, you replace groups of observations with summary statistics based on those observations. When you reshape data, you alter the structure (rows and columns) determining how the data is organized.

Using sqldf library:

require(sqldf)

myList<- data.frame(v=c("Bob", "Mary", "Bob", "Bob", "Joe"))
sqldf("SELECT v,count(1) FROM myList GROUP BY v")

How to Aggregate Data in R, S3 method for formula aggregate(formula, data, FUN, …, subset, na.action and hence it can be a function or a symbol or character string naming a function. Aggregating Data. It is relatively easy to collapse data in R using one or more BY variables and a defined function. When using the aggregate() function, the by variables must be in a list (even if there is only one). The function can be built-in or user provided.

aggregate function, This method consumes the data and adds it to the internal variables that are used to track the aggregation. The default (As with functions and procedures, SqlTypes are usually recommended.) data. □ Once all aggregation is finished, an aggregate must produce a scalar result. BinaryReader r) { Init(); int count = r. In my recent post I have written about the aggregate function in base R and gave some examples on its use. This post repeats the same examples using data.table instead, the most efficient implementation of the aggregation logic in R, plus some additional use cases showing the power of the data.table package.

T-SQL Querying, R. 5. □. □. □. Grouping. and. Summarizing. Data. So far, you have learned to write simple The previous chapter taught you how to write queries with multiple tables so that the data makes sense in You use aggregate functions to summarize data in queries. AVG: Calculates the average in numeric or money data. Examples of aggregate data are strings, arrays, classes, and structures. Aggregate Data Type is use to create higher-level structures, and for a more effective data use. A proper used of sturctured data helps to make programs less complicated, easier to understand and maintain.

Beginning T-SQL 2008, In this type of analysis it is best to have more data and to subset these multiple tweets to ensure accuracy. a simple way to substitute parts of strings. This family of functions can be useful to clean up unwanted punctuation, aggregate terms  Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.

Comments
  • you can simplify the last line to as.data.frame(ctable) Note that the semicolons are only needed if you put more than one command on a line.
  • It gives an error for me - a one liner based on Thierry's suggestion would be: as.data.frame(table(myList))
  • That's interesting. What kind of error message did you get? I just tried it without getting an error message.
  • Scratch that - I tried it after defining myList as a list, not a data.frame