How to replace duplicate questions/values in a row

r remove duplicate rows based on two columns
how to remove duplicates but keep rest of the row values in excel
r duplicate rows based on one column
no duplicate values found
how to duplicate cells in excel
how to delete duplicate rows in excel
excel remove duplicate rows based on one column

I am about to spread a table, however I encountered a problem with my data. The data is based on a questionnaire, and the questions are in 1 column while the answers are in the next column. This data file contains about 20000 questionnaires, all pasted underneath each other.

It looks like this:

*Participant*   |      *Question*      |        *Answer* 
Paul            |    Age               |         15
Paul            |    City              |      Amsterdam
Paul            |    Pet_name          |       Butterfly
Paul            |    Fav_color         |       Pink
Paul            |    Parent_name       |       Hank
Paul            |    Parent_name       |       Mary
Adam            |    Age               |         78
Adam            |    City              |         LA
Adam            |    Pet_name          |       Crocodile
Adam            |    Fav_color         |       Purple
Adam            |    Parent_name       |       Pete
Adam            |    Parent_name       |       Peter

The problem is: I can't spread when 2 questions are called the same, in the case "Parent_name".

So preferably I'd like to replace the second occurrence of Parent_name, per Participant to be something like Parent2_name or Parent_name2

What I've tried to do is identify duplicated values with the duplicated() function, however, since all questions are repeated it will just flag everything from the second participant on as duplicated

To recreate my example data:

Participant <- c('Paul','Paul','Paul','Paul','Paul','Paul', 'Adam', 'Adam', 'Adam', 'Adam', 'Adam', 'Adam' )
Question <- c('Age', 'City', 'Pet_name', 'Fav_color', 'Parent_name', 'Parent_name', 'Age', 'City', 'Pet_name', 'Fav_color', 'Parent_name', 'Parent_name')
Answer <- c('15', 'Amsterdam', 'Butterfly', 'Pink', 'Hank', 'Mary', '78', 'LA', 'Crocodile', 'Purple', 'Pete', 'Peter')

df <- data.frame(Participant, Question, Answer)

So the final product would look like:

*Participant*   |      *Question*      |        *Answer* 
Paul            |    Age               |         15
Paul            |    City              |      Amsterdam
Paul            |    Pet_name          |       Butterfly
Paul            |    Fav_color         |       Pink
Paul            |    Parent_name       |       Hank
Paul            |    Parent2_name      |       Mary
Adam            |    Age               |         78
Adam            |    City              |         LA
Adam            |    Pet_name          |       Crocodile
Adam            |    Fav_color         |       Purple
Adam            |    Parent_name       |       Pete
Adam            |    Parent2_name      |       Peter

We can group_by Participant and Question and append the row_number() to Question if there is more than one row. This will work for any Question with duplicated values.

library(dplyr)

df %>%
  group_by(Participant, Question) %>%
  mutate(Question1 = if (n() > 1) paste0(Question, row_number()) else Question) %>%
  ungroup %>%
  select(-Question)

#   Participant Answer    Question1   
#   <chr>       <chr>     <chr>       
# 1 Paul        15        Age         
# 2 Paul        Amsterdam City        
# 3 Paul        Butterfly Pet_name    
# 4 Paul        Pink      Fav_color   
# 5 Paul        Hank      Parent_name1
# 6 Paul        Mary      Parent_name2
# 7 Adam        78        Age         
# 8 Adam        LA        City        
# 9 Adam        Crocodile Pet_name    
#10 Adam        Purple    Fav_color   
#11 Adam        Pete      Parent_name1
#12 Adam        Peter     Parent_name2

data

df <- data.frame(Participant, Question, Answer, stringsAsFactors = FALSE)

replace duplicate values with string.empty, Try this: Hide Expand Copy Code. Dim table As New DataTable ' Create four typed columns in the DataTable. table.Columns.Add("MyKey"  Find And Remove Duplicate Values With Conditional Formatting With conditional formatting, there’s a way to highlight duplicate values in your data. Just like the formula method, you need to add a helper column that combines the data from columns.

Doing this with data.table:

library(data.table)
Participant <- c('Paul','Paul','Paul','Paul','Paul','Paul', 'Adam', 'Adam', 'Adam', 'Adam', 'Adam', 'Adam' )
Question <- c('Age', 'City', 'Pet_name', 'Fav_color', 'Parent_name', 'Parent_name', 'Age', 'City', 'Pet_name', 'Fav_color', 'Parent_name', 'Parent_name')
Answer <- c('15', 'Amsterdam', 'Butterfly', 'Pink', 'Hank', 'Mary', '78', 'LA', 'Crocodile', 'Purple', 'Pete', 'Peter')

df <- data.table(Participant, Question, Answer)

Set a new column with an ID by participant and question, then join it to the original question where it is greater than two, and then remove the additional column.

df[, id := seq_len(.N), by = .(Participant, Question)]
df[id != 1, Question:= paste0(Question, id)]
df[, id := NULL]

Result:

> df
    Participant     Question    Answer
 1:        Paul          Age        15
 2:        Paul         City Amsterdam
 3:        Paul     Pet_name Butterfly
 4:        Paul    Fav_color      Pink
 5:        Paul  Parent_name      Hank
 6:        Paul Parent_name2      Mary
 7:        Adam          Age        78
 8:        Adam         City        LA
 9:        Adam     Pet_name Crocodile
10:        Adam    Fav_color    Purple
11:        Adam  Parent_name      Pete
12:        Adam Parent_name2     Peter

Removing duplicates associated with unique values , Even though there are only 5 questions, since one question allows multiple responses, the "Question" column looks like it has duplicates. A row  Remove duplicates but keep rest of row values with VBA. In Excel, there is a VBA code that also can remove duplicates but keep rest of row values. 1. Press Alt + F11 keys to display Microsoft Visual Basic for Applications window. 2. Click Insert > Module, and paste below code to the Module. VBA: Remove duplicates but keep rest of row values

With dplyr:

library(dplyr)
df %>%
  group_by(Question) %>%
  mutate(Index = rank(Question, ties.method = "first")) %>%
  ungroup() %>%
  mutate(Question = paste(Question,Index, sep = "_")) %>%
  select(-Index) %>%
  spread(Question,Answer)

This uses rank to sequence the variables and should work no matter the number of duplicates or their order.

DB2 UDB 8.1 Exam 700 Practice Questions, A trigger is activated by an INSERT, UPDATE or DELETE of a record in a particular table. You can Rows are considered duplicates if the values for all columns  Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values.

Filter for unique values or remove duplicate values, Insert or delete rows, and columns. Article · Select cell contents Learn about filtering for unique values or removing duplicate values. Filtering for unique values  Remove Duplicates Using Row_Number. WITH CTE (Col1, Col2, Col3, DuplicateCount) AS ( SELECT Col1, Col2, Col3, ROW_NUMBER() OVER(PARTITION BY Col1, Col2, Col3 ORDER BY Col1) AS DuplicateCount FROM MyTable ) SELECT * from CTE Where DuplicateCount = 1

DB2 9 Exam 730 Practice Questions, Which keyword is used in a query to eliminate duplicate records from the result Rows are considered duplicates if the values for all columns selected by the  I have been working with a dataframe in Pandas that contains duplicate entries along with non-duplicates in a column. The dataframe looks something like this: country_name values category 0 country_1 10 a 1 country_2 20 b 2 country_1 50 a 3 country_2 10 b 4 country_3 100 c 5 country_4 10 d

Solved: How can I make my duplicate values null?, Upcoming Live Community Q&A Session - Submit your questions for find it, but I was curious to know if there was a way to make duplicate values show as null. elseif isnull([Row-1:Customer ID #]) then null() else [Customer ID #] endif either use a formula tool to replace to null or use find and replace ? So three sample inserts, one to create an initial row, the second to simulate (a) new single row that already exists (b) new single row that doesn't already exist (c) new pair of rows that don't already exist, and the third to simulate a new pair of rows that already have a partID in the target.

Comments
  • Hey, this works on my Mock Data, however in my actual data I have a couple more columns that cause an error I think. I tried this code, based on your code: group_by(Nummer, Leeftijd, Geslacht, Woonplaats, Datum, Vraag) %>% mutate(Vraag1 = if (n() > 1) paste0(Vraag, row_number()) else Question) And I get this error: Error: Column Vraag1 must be length 1 (the group size), not 12 Sorry if the column names are in Dutch, I can translate if that makes it easier. All of these extra columns are like the Participant column btw, they repeat the values 6 times
  • @JolienJansen please add another example if there is a different part of your data causing an error. Column names in any language are fine - the content does not concern us as much as the structure.
  • @ cddt Will do that now
  • @JolienJansen Why are you grouping by other columns? Also is your column name Vraag or Question ?
  • Hey, So Question == Vraag in Dutch, so in my own Dataset the 'Question' column is called 'Vraag'. Even if I only group by 'Participant', in my own dataset it gives me the error "Error: Column Vraag1 must be length 1 (the group size), not 12". I tried to recreate my own dataset as best as possible, but I only get the error on my own data, not the mock dataset. I am not sure what to recreate in my mock dataset to get the error, and it's not really possible to just post my own dataset on here. Not sure how to recreate the error now. Any ideas?
  • @JolienJansen I should mention that for large data sets, I expect a data.table solution to be faster than a dplyr one.
  • That is good to know, I've only recently started working with bigger data sets, so tips like those are definitely welcome.