Adding a column to a data.frame

I have the data.frame below. I want to add a column that classifies my data according to column 1 (h_no) in that way that the first series of h_no 1,2,3,4 is class 1, the second series of h_no (1 to 7) is class 2 etc. such as indicated in the last column.

h_no  h_freq  h_freqsq
1     0.09091 0.008264628 1
2     0.00000 0.000000000 1
3     0.04545 0.002065702 1
4     0.00000 0.000000000 1  
1     0.13636 0.018594050 2
2     0.00000 0.000000000 2
3     0.00000 0.000000000 2
4     0.04545 0.002065702 2
5     0.31818 0.101238512 2
6     0.00000 0.000000000 2
7     0.50000 0.250000000 2 
1     0.13636 0.018594050 3 
2     0.09091 0.008264628 3
3     0.40909 0.167354628 3
4     0.04545 0.002065702 3

You can add a column to your data using various techniques. The quotes below come from the "Details" section of the relevant help text, [[.data.frame.

Data frames can be indexed in several modes. When [ and [[ are used with a single vector index (x[i] or x[[i]]), they index the data frame as if it were a list.

my.dataframe["new.col"] <- a.vector
my.dataframe[["new.col"]] <- a.vector

The data.frame method for $, treats x as a list

my.dataframe$new.col <- a.vector

When [ and [[ are used with two indices (x[i, j] and x[[i, j]]) they act like indexing a matrix

my.dataframe[ , "new.col"] <- a.vector

Since the method for data.frame assumes that if you don't specify if you're working with columns or rows, it will assume you mean columns.


For your example, this should work:

# make some fake data
your.df <- data.frame(no = c(1:4, 1:7, 1:5), h_freq = runif(16), h_freqsq = runif(16))

# find where one appears and 
from <- which(your.df$no == 1)
to <- c((from-1)[-1], nrow(your.df)) # up to which point the sequence runs

# generate a sequence (len) and based on its length, repeat a consecutive number len times
get.seq <- mapply(from, to, 1:length(from), FUN = function(x, y, z) {
            len <- length(seq(from = x[1], to = y[1]))
            return(rep(z, times = len))
         })

# when we unlist, we get a vector
your.df$group <- unlist(get.seq)
# and append it to your original data.frame. since this is
# designating a group, it makes sense to make it a factor
your.df$group <- as.factor(your.df$group)


   no     h_freq   h_freqsq group
1   1 0.40998238 0.06463876     1
2   2 0.98086928 0.33093795     1
3   3 0.28908651 0.74077119     1
4   4 0.10476768 0.56784786     1
5   1 0.75478995 0.60479945     2
6   2 0.26974011 0.95231761     2
7   3 0.53676266 0.74370154     2
8   4 0.99784066 0.37499294     2
9   5 0.89771767 0.83467805     2
10  6 0.05363139 0.32066178     2
11  7 0.71741529 0.84572717     2
12  1 0.10654430 0.32917711     3
13  2 0.41971959 0.87155514     3
14  3 0.32432646 0.65789294     3
15  4 0.77896780 0.27599187     3
16  5 0.06100008 0.55399326     3

How to add a column to a data.frame in R? - tools, There are many different ways of adding and removing columns from a data frame. data <- read.table(header=TRUE, text=' id weight 1 20 2 27 3 24 ') # Ways to  We can use a Python dictionary to add a new column in pandas DataFrame. Use an existing column as the key values and their respective values will be the values for new column. # Import pandas package. import pandas as pd. # Define a dictionary containing Students data. data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],

Easily: Your data frame is A

b <- A[,1]
b <- b==1
b <- cumsum(b)

Then you get the column b.

Adding and removing columns from a data frame, frame() function. We can name the columns with name() and simply specify the name of the variables. data.frame(df, stringsAsFactors = TRUE). Syntax – Add Column. The syntax to add a column to DataFrame is: mydataframe['new_column_name'] = column_values. where mydataframe is the dataframe to which you would like to add the new column with the label new_column_name. You can either provide all the column values as a list or a single value that is taken as default value for all of the rows.

If I understand the question correctly, you want to detect when the h_no doesn't increase and then increment the class. (I'm going to walk through how I solved this problem, there is a self-contained function at the end.)

Working

We only care about the h_no column for the moment, so we can extract that from the data frame:

> h_no <- data$h_no

We want to detect when h_no doesn't go up, which we can do by working out when the difference between successive elements is either negative or zero. R provides the diff function which gives us the vector of differences:

> d.h_no <- diff(h_no)
> d.h_no
 [1]  1  1  1 -3  1  1  1  1  1  1 -6  1  1  1

Once we have that, it is a simple matter to find the ones that are non-positive:

> nonpos <- d.h_no <= 0
> nonpos
 [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
[13] FALSE FALSE

In R, TRUE and FALSE are basically the same as 1 and 0, so if we get the cumulative sum of nonpos, it will increase by 1 in (almost) the appropriate spots. The cumsum function (which is basically the opposite of diff) can do this.

> cumsum(nonpos)
 [1] 0 0 0 1 1 1 1 1 1 1 2 2 2 2

But, there are two problems: the numbers are one too small; and, we are missing the first element (there should be four in the first class).

The first problem is simply solved: 1+cumsum(nonpos). And the second just requires adding a 1 to the front of the vector, since the first element is always in class 1:

 > classes <- c(1, 1 + cumsum(nonpos))
 > classes
  [1] 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3

Now, we can attach it back onto our data frame with cbind (by using the class= syntax, we can give the column the class heading):

 > data_w_classes <- cbind(data, class=classes)

And data_w_classes now contains the result.

Final result

We can compress the lines together and wrap it all up into a function to make it easier to use:

classify <- function(data) {
   cbind(data, class=c(1, 1 + cumsum(diff(data$h_no) <= 0)))
}

Or, since it makes sense for the class to be a factor:

classify <- function(data) {
   cbind(data, class=factor(c(1, 1 + cumsum(diff(data$h_no) <= 0))))
}

You use either function like:

> classified <- classify(data) # doesn't overwrite data
> data <- classify(data) # data now has the "class" column

(This method of solving this problem is good because it avoids explicit iteration, which is generally recommend for R, and avoids generating lots of intermediate vectors and list etc. And also it's kinda neat how it can be written on one line :) )

R Data Frame: Create, Append, Select, Subset, One of the easiest tasks to perform in R is adding a new column to a data frame based on one or more other columns. You might want to add up several of your  Adding a column to a data.frame. I have the data.frame below. I want to add a column that classifies my data according to column 1 (h_no) in that way that the first series of h_no 1,2,3,4 is class 1, the second series of h_no (1 to 7) is class 2 etc. such as indicated in the last column.

In addition to Roman's answer, something like this might be even simpler. Note that I haven't tested it because I do not have access to R right now.

# Note that I use a global variable here
# normally not advisable, but I liked the
# use here to make the code shorter
index <<- 0
new_column = sapply(df$h_no, function(x) {
  if(x == 1) index = index + 1
  return(index)
})

The function iterates over the values in n_ho and always returns the categorie that the current value belongs to. If a value of 1 is detected, we increase the global variable index and continue.

4 data wrangling tasks in R for advanced beginners, Use the original df1 indexes to create the series: df1['e'] = pd.Series(np.random.​randn(sLength), index=df1.index). Edit 2015. Some reported getting the  Adding and removing columns from a data frame Problem. You want to add or remove columns from a data frame. Solution. There are many different ways of adding and removing columns from a data frame.

Data.frame[,'h_new_column'] <- as.integer(Data.frame[,'h_no'], breaks=c(1, 4, 7))

Adding new column to existing DataFrame in Python pandas, A step-by-step Python code example that shows how to add new column to Pandas DataFrame with default value. Provided by Data Interview Questions,  Adding Multiple Variables/Columns To R Data Frame. We can add multiple variables/columns to a data frame using cbind() function. To add the multiple columns to a data frame we need to follow the below steps. Create a new Data Frame with an individual column using vector c() function.

Add new column to DataFrame with default value, While doing data wrangling or data manipulation, often one may want to add a new column or variable to an existing Pandas dataframe without  Creating a new column to a data frame using a formula from another variable. I want to create a new column to a data frame using a formula from another variable.

3 Ways to Add New Columns to Pandas Dataframe?, Creating new columns in your data frame is as simple as assigning the new information to data_frame$new_column . Often, the newly created column is some  In this article we will discuss how to add columns in a dataframe using both operator [] and df.assign (). Let’s create a Dataframe object i.e. Contents of the dataframe dfobj are, Now lets discuss different ways to add columns in this data frame. Suppose we want to add a new column ‘Marks’ with default values from a list.

Adding new columns, How do I add a column to a dataset in Python? This causes problems when trying to add a new row with mixed data types (some string, some numeric). In such a case, even the numeric values are converted to string. One workaround is to add the values separately, something like the following (assuming there are 3 columns): df [nrow (df) + 1, 1:2] = c ("v1", "v2") and df [nrow (df), 3] = 100

Comments
  • What's the difference between the last two methods of adding a column?
  • @huon-dbaupp the method with a comma is explicit and will also work on matrices, while the last one works on data.frames only. If no comma is provided, R assumes you mean columns.
  • Nice and short. I would just change the last element so that instead of being cumsum(b) -> b the result would be directly added as a column to the original data frame, something like A$groups <- cumsum(b).
  • cumsum(b) will give you a vector of length 3, or am I missing something?
  • @RomanLuštrik, see dbaupp's solution which explains how cumsum would work in this case.
  • @RomanLuštrik, This solution can be rewritten really nicely in a single line. Using your your.df data, you can simply do your.df$group = cumsum(your.df[, 1]==1) to get your new group column.
  • I like the hack with the global variable. So Cish. :P