R. How to create a new column, returning i based on another column in R

r add column to dataframe based on other columns
r create new column based on multiple condition
r add column with values based on another column
r create new column based on condition
add column to dataframe r
r add calculated column to dataframe
create new variable in r dplyr
r create new dataframe from existing

I have a dataframe

employee <- c('John Doe','Peter Gynn','Jolie Hope')
salary <- c(21000, NA, 26800)
startdate <- as.Date(c('2010-11-1', NA,'2007-3-14'))

employ.data <- data.frame(employee, salary, startdate)

I want a new column employ.data$NA that returns i in employ.data$employee if [i] in any other row is NA.

I have tried this for 1 column but getting errors

employ.data$NA = NA 
{for (i in 1:nrow(Eurostat)) 
  {
  if (startdate[i] = "NA")  employ.data$employee[i]
}

Any help would be appreciated.


You need complete.cases() from base R -

employ.data$missingFlag <- !complete.cases(employ.data)

    employee salary  startdate missingFlag
1   John Doe  21000 2010-11-01       FALSE
2 Peter Gynn     NA       <NA>        TRUE
3 Jolie Hope  26800 2007-03-14       FALSE

It preserves existing variables. transmute(): compute new columns but drop apply a function to columns selected with a predicate function that returns TRUE. mutate: Add new variables by preserving existing ones; transmute: Make new We'll use the R built-in iris data set, which we start by converting into a tibble data  Making statements based on opinion; back them up with references or personal experience. To learn more, see our tips on writing great answers. draft saved. draft discarded. Post Your Answer. Discard. By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy.


Try to vectorize it and use an ifelse statement:

employ.data <- data.frame(employee, salary, startdate, stringsAsFactors = F)
employ.data["missing"] = with(employ.data, ifelse(is.na(startdate), employee, NA))
employ.data
    employee salary  startdate    missing
1   John Doe  21000 2010-11-01       <NA>
2 Peter Gynn     NA       <NA> Peter Gynn
3 Jolie Hope  26800 2007-03-14       <NA>

Alternatively, to check all columns, use any:

employ.data <- data.frame(employee, salary, startdate, stringsAsFactors = F)
employ.data["something_missing"] = apply(employ.data, 1, function(x) any(is.na(x)))
employ.data
    employee salary  startdate something_missing
1   John Doe  21000 2010-11-01             FALSE
2 Peter Gynn     NA       <NA>              TRUE
3 Jolie Hope  26800 2007-03-14             FALSE

The construct above will give you booleans. If you want to get a column of the names, you can combine it with ifelse.

On a more general note, instantiating a column first and then looping through the dataframe to populate it is not particularly Rtistic, and I would suggest to avoid this strategy whenever possible. The apply-family of functions are very powerful, and ifelse is too. dplyr's mutate combined with case_when statments can also be used in case you want something more SQL-like.

Just for pedagocial reasons, here is your code in working version. Please don't use it, just try to understand the differences.

employ.data$missing = NA 
for (i in 1:nrow(employ.data)) {
  if (is.na(employ.data$startdate[i])){ 
         employ.data$missing[i] <- employ.data$employee[i]
        }
}

Importantly, note that "NA" is interpreted as a string. To test if a value is NA, you need to use e.g. is.na. After all, testing if 42 == NA is ambiguous. The value is missing. It may or may not be equal to 42, so the test will return NA.

Tables can be subsetted by rows based on column values. Note that R is case sensitive, so make sure that you respect each letter's case (i.e. upper or lower). You can add columns (and compute their values) using the mutate function. factor level with another, the following example will not return the expected output​. I want to create a new column D of data where: if column A is less than 5, then column D = column A if column A is = 5, then column D = 0 if column A is = 6, then column D = column B What would b


It can be done quite easily with dplyr:

library(dplyr)

employee <- c('John Doe','Peter Gynn','Jolie Hope')
salary <- c(21000, NA, 26800)
startdate <- as.Date(c('2010-11-1', NA,'2007-3-14'))

employ.data <- data.frame(employee, salary, startdate)

employ.data <- employ.data %>% 
  rowwise() %>% 
  mutate(missing = any(is.na(c(salary, startdate))))

R is mighty, but it can be complex for data tasks. This is another way to accomplish what we did above. Here's the Or, in one step, we can create a new column that's already rounded to one decimal place: Syntax 3: R's apply() function But let's go back to our earlier example of calculating a profit margin for each row. We all start out new. I learned the same lesson myself (cross posting). Some people were very nasty about this unwritten or in some cases written rule and made me feel about 2 inches high (Dason was comical in his rebuke ). Let me explain to you the general convention I've seen used so you don't make the mistakes I've made.


That is a quite noob tip and me, as a noob in R, was pleased to do that. If you want to add many columns, one way to accomplish that is using a list with saida$D # As an example, the new column receives the result of C - D in the data frame, one wants to add the output of predict as another column. Add a column to a dataframe in R using dplyr. In my opinion, the best way to add a column to a dataframe in R is with the mutate() function from dplyr. mutate(), like all of the functions from dplyr is easy to use. Let’s take a look: Load packages. First things first: we’ll load the packages that we will use. Specifically, we’ll load


Packages in R are basically sets of additional functions that let you do more stuff. can be conducted on that database, and only the results of the query returned. Frequently you'll want to create new columns based on the values in existing Another thing we might do here is sort rows by mean_weight , using arrange() . Often, the newly created column is some transformation of existing columns, so the $ operator really comes in handy here! checkmark_circle. Instructions. 100 XP. checkmark_circle. Instructions. 100 XP. Create a new worst case scenario where you only receive 25% of your expected cash flow, add it to the data frame as quarter_cash.


Add new columns to a data frame that are functions of existing columns with mutate . be conducted on that database, and only the results of the query are returned. of wanting to reshape your data for plotting and use by different R functions. Frequently you'll want to create new columns based on the values in existing  One of the easiest tasks to perform in R is adding a new column to a data frame based on one or more other columns. You might want to add up several of your existing columns, find an average or