How to replace duplicate questions/values in a row
how to remove duplicates but keep rest of the row values in excel
r duplicate rows based on one column
no duplicate values found
how to duplicate cells in excel
how to delete duplicate rows in excel
excel remove duplicate rows based on one column
I am about to spread a table, however I encountered a problem with my data. The data is based on a questionnaire, and the questions are in 1 column while the answers are in the next column. This data file contains about 20000 questionnaires, all pasted underneath each other.
It looks like this:
*Participant* | *Question* | *Answer* Paul | Age | 15 Paul | City | Amsterdam Paul | Pet_name | Butterfly Paul | Fav_color | Pink Paul | Parent_name | Hank Paul | Parent_name | Mary Adam | Age | 78 Adam | City | LA Adam | Pet_name | Crocodile Adam | Fav_color | Purple Adam | Parent_name | Pete Adam | Parent_name | Peter
The problem is: I can't spread when 2 questions are called the same, in the case "Parent_name".
So preferably I'd like to replace the second occurrence of Parent_name, per Participant to be something like Parent2_name or Parent_name2
What I've tried to do is identify duplicated values with the
duplicated() function, however, since all questions are repeated it will just flag everything from the second participant on as duplicated
To recreate my example data:
Participant <- c('Paul','Paul','Paul','Paul','Paul','Paul', 'Adam', 'Adam', 'Adam', 'Adam', 'Adam', 'Adam' ) Question <- c('Age', 'City', 'Pet_name', 'Fav_color', 'Parent_name', 'Parent_name', 'Age', 'City', 'Pet_name', 'Fav_color', 'Parent_name', 'Parent_name') Answer <- c('15', 'Amsterdam', 'Butterfly', 'Pink', 'Hank', 'Mary', '78', 'LA', 'Crocodile', 'Purple', 'Pete', 'Peter') df <- data.frame(Participant, Question, Answer)
So the final product would look like:
*Participant* | *Question* | *Answer* Paul | Age | 15 Paul | City | Amsterdam Paul | Pet_name | Butterfly Paul | Fav_color | Pink Paul | Parent_name | Hank Paul | Parent2_name | Mary Adam | Age | 78 Adam | City | LA Adam | Pet_name | Crocodile Adam | Fav_color | Purple Adam | Parent_name | Pete Adam | Parent2_name | Peter
Question and append the
Question if there is more than one row. This will work for any
Question with duplicated values.
library(dplyr) df %>% group_by(Participant, Question) %>% mutate(Question1 = if (n() > 1) paste0(Question, row_number()) else Question) %>% ungroup %>% select(-Question) # Participant Answer Question1 # <chr> <chr> <chr> # 1 Paul 15 Age # 2 Paul Amsterdam City # 3 Paul Butterfly Pet_name # 4 Paul Pink Fav_color # 5 Paul Hank Parent_name1 # 6 Paul Mary Parent_name2 # 7 Adam 78 Age # 8 Adam LA City # 9 Adam Crocodile Pet_name #10 Adam Purple Fav_color #11 Adam Pete Parent_name1 #12 Adam Peter Parent_name2
df <- data.frame(Participant, Question, Answer, stringsAsFactors = FALSE)
replace duplicate values with string.empty, Try this: Hide Expand Copy Code. Dim table As New DataTable ' Create four typed columns in the DataTable. table.Columns.Add("MyKey" Find And Remove Duplicate Values With Conditional Formatting With conditional formatting, there’s a way to highlight duplicate values in your data. Just like the formula method, you need to add a helper column that combines the data from columns.
Doing this with
library(data.table) Participant <- c('Paul','Paul','Paul','Paul','Paul','Paul', 'Adam', 'Adam', 'Adam', 'Adam', 'Adam', 'Adam' ) Question <- c('Age', 'City', 'Pet_name', 'Fav_color', 'Parent_name', 'Parent_name', 'Age', 'City', 'Pet_name', 'Fav_color', 'Parent_name', 'Parent_name') Answer <- c('15', 'Amsterdam', 'Butterfly', 'Pink', 'Hank', 'Mary', '78', 'LA', 'Crocodile', 'Purple', 'Pete', 'Peter') df <- data.table(Participant, Question, Answer)
Set a new column with an ID by participant and question, then join it to the original question where it is greater than two, and then remove the additional column.
df[, id := seq_len(.N), by = .(Participant, Question)] df[id != 1, Question:= paste0(Question, id)] df[, id := NULL]
> df Participant Question Answer 1: Paul Age 15 2: Paul City Amsterdam 3: Paul Pet_name Butterfly 4: Paul Fav_color Pink 5: Paul Parent_name Hank 6: Paul Parent_name2 Mary 7: Adam Age 78 8: Adam City LA 9: Adam Pet_name Crocodile 10: Adam Fav_color Purple 11: Adam Parent_name Pete 12: Adam Parent_name2 Peter
Removing duplicates associated with unique values , Even though there are only 5 questions, since one question allows multiple responses, the "Question" column looks like it has duplicates. A row Remove duplicates but keep rest of row values with VBA. In Excel, there is a VBA code that also can remove duplicates but keep rest of row values. 1. Press Alt + F11 keys to display Microsoft Visual Basic for Applications window. 2. Click Insert > Module, and paste below code to the Module. VBA: Remove duplicates but keep rest of row values
library(dplyr) df %>% group_by(Question) %>% mutate(Index = rank(Question, ties.method = "first")) %>% ungroup() %>% mutate(Question = paste(Question,Index, sep = "_")) %>% select(-Index) %>% spread(Question,Answer)
This uses rank to sequence the variables and should work no matter the number of duplicates or their order.
DB2 UDB 8.1 Exam 700 Practice Questions, A trigger is activated by an INSERT, UPDATE or DELETE of a record in a particular table. You can Rows are considered duplicates if the values for all columns Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values.
Filter for unique values or remove duplicate values, Insert or delete rows, and columns. Article · Select cell contents Learn about filtering for unique values or removing duplicate values. Filtering for unique values Remove Duplicates Using Row_Number. WITH CTE (Col1, Col2, Col3, DuplicateCount) AS ( SELECT Col1, Col2, Col3, ROW_NUMBER() OVER(PARTITION BY Col1, Col2, Col3 ORDER BY Col1) AS DuplicateCount FROM MyTable ) SELECT * from CTE Where DuplicateCount = 1
DB2 9 Exam 730 Practice Questions, Which keyword is used in a query to eliminate duplicate records from the result Rows are considered duplicates if the values for all columns selected by the I have been working with a dataframe in Pandas that contains duplicate entries along with non-duplicates in a column. The dataframe looks something like this: country_name values category 0 country_1 10 a 1 country_2 20 b 2 country_1 50 a 3 country_2 10 b 4 country_3 100 c 5 country_4 10 d
Solved: How can I make my duplicate values null?, Upcoming Live Community Q&A Session - Submit your questions for find it, but I was curious to know if there was a way to make duplicate values show as null. elseif isnull([Row-1:Customer ID #]) then null() else [Customer ID #] endif either use a formula tool to replace to null or use find and replace ? So three sample inserts, one to create an initial row, the second to simulate (a) new single row that already exists (b) new single row that doesn't already exist (c) new pair of rows that don't already exist, and the third to simulate a new pair of rows that already have a partID in the target.
- Hey, this works on my Mock Data, however in my actual data I have a couple more columns that cause an error I think. I tried this code, based on your code: group_by(Nummer, Leeftijd, Geslacht, Woonplaats, Datum, Vraag) %>% mutate(Vraag1 = if (n() > 1) paste0(Vraag, row_number()) else Question) And I get this error: Error: Column
Vraag1must be length 1 (the group size), not 12 Sorry if the column names are in Dutch, I can translate if that makes it easier. All of these extra columns are like the Participant column btw, they repeat the values 6 times
- @JolienJansen please add another example if there is a different part of your data causing an error. Column names in any language are fine - the content does not concern us as much as the structure.
- @ cddt Will do that now
- @JolienJansen Why are you grouping by other columns? Also is your column name
- Hey, So Question == Vraag in Dutch, so in my own Dataset the 'Question' column is called 'Vraag'. Even if I only group by 'Participant', in my own dataset it gives me the error "Error: Column
Vraag1must be length 1 (the group size), not 12". I tried to recreate my own dataset as best as possible, but I only get the error on my own data, not the mock dataset. I am not sure what to recreate in my mock dataset to get the error, and it's not really possible to just post my own dataset on here. Not sure how to recreate the error now. Any ideas?
- @JolienJansen I should mention that for large data sets, I expect a
data.tablesolution to be faster than a
- That is good to know, I've only recently started working with bigger data sets, so tips like those are definitely welcome.