How can I normalise MySQL tables with duplicate values

mysql delete duplicate rows
how to avoid duplicate records in mysql
find duplicate rows in mysql
update duplicate rows in mysql
how to remove duplicate rows in mysql without using temporary table
how to delete duplicate records in sql using temp table
delete duplicate records in same table
php remove duplicate rows mysql

I have 2 tables fruits and colors.

In the fruits table, the cid column references the c_id from the color table, but the problem is that the colors table, has duplicate color names:

Is there an effective way in MySQL to remove the duplicate color rows and update the cid in the foods table accordingly so the result will be something like this?

Assuming that there is a foreign key constraint between the tables, you first need to update table fruit. For this, you can join the tables to get the color name, and then retrieve the minimum c_id of that color using a correlated subquery:

update fruit f
inner join color c on f.cid = c.c_id
set f.cid = (select min(c_id) from color c1 where c1.name = c.c_name)

Then you can safely delete the duplicate colors while keeping the one with the lowest c_id:

delete c
from color c
inner join color c1 on c1.c_name = c.c_name and c1.c_id < c.c_id

Preventing duplicate records on normalized table. MySQL and PHP , For my photography site I have three tables relevant for this question. One for gallery categories, one for photographs, and one to link  MySQL is a database application that stores data in rows and columns of different tables to avoid duplication. Duplicate values can occur, which can impact MySQL performance. This guide will show you how to find duplicate values in a MySQL database .

You could get yourself a result set that has the minimum cid of matching colors for each f_id

SELECT fruit.f_id, fruit.f_name, min(c2.c_id) as c_id
FROM
    fruit
    INNER JOIN color c1 ON fruit.cid = c1.c_id
    INNER JOIN color c2 ON cl.c_name = c2.c_name
GROUP BY fruit.f_id, fruit.f_name

That's not the most efficient query, but it will work. You can use this to set your fruit table correct to only reference a single color when there are duplicates.

After fixing your fruit table you can then run a query to see which colors are unused so you know what to delete:

SELECT color.*
FROM color
  LEFT OUTER JOIN fruit on color.c_id = fruit.cid
WHERE fruit.f_id IS NULL

normalization of data and duplicate rows, No matter how many times I try to learn MySQL, every single time I try to use data from two different tables, but only include each row once, I get  CREATE TABLE new_table AS SELECT * FROM original_table; Please be careful when using this to clone big tables. This can take a lot of time and server resources.

First, you need to update fruit to only reference one of each color name:

UPDATE fruit AS f 
INNER JOIN color As c ON f.cid = c.c_id
INNER JOIN (SELECT c_name, MIN(c_id) AS firstCid FROM color GROUP BY c_name) AS firsts
ON c.c_name = firsts.c_name
SET f.c_id = firsts.firstCid
;

Note: this is similar to GMB's answer, but does not use a correlated subquery.

Then, the duplicates can be cleaned up with something like this ...

DELETE 
FROM colors 
WHERE c_id NOT IN (
     SELECT MIN(c_id) 
     FROM colors 
     GROUP BY c_name
   )

this will preserve unused colors as well, however....

MySQL does not usually like queries that select and delete from the same table simultaneously, so it might have to be expressed like so to "trick" MySQL:

DELETE 
FROM colors 
WHERE c_id NOT IN (
     SELECT * 
     FROM (
         SELECT MIN(c_id) 
         FROM colors 
         GROUP BY c_name
     ) AS firstIds
    )

How can I normalise MySQL tables with duplicate values, Is there an effective way in MySQL to remove the duplicate color rows and update the cid in the foods table accordingly so the result will be  Normalization is a technique for organizing data in a database. It is important that a database is normalized to minimize redundancy (duplicate data) and to ensure only related data is stored in each table. It also prevents any issues stemming from database modifications such as insertions, deletions, and updates.

You can achieve this in steps -

1. Delete the duplicates-

DELETE FROM colors C1
WHERE EXISTS (SELECT 1
              FROM colors C2
              WHERE C2.c_name = C1.c_name
              AND C2.c_id > C1.c_id);

2. Reset the c_id-

UPDATE colors C1
JOIN
(
    SELECT @rownum:=@rownum+1 rownum, c_id, c_name
    FROM colors
    CROSS JOIN (select @rownum := 0) rn
) AS C2 ON C1.c_name = C2.c_name
SET C1.c_id = C2.rownum

SQL, SQL – Remove Duplicate Rows without Temporary Table at times, because data sent is mostly from departments like HR and finance where people are not well aware of data normalization techniques [:-)]. delete-duplicate-rows-in-​mysql. Here atomicity means values in the table should not be further divided. In simple terms, a single cell cannot hold multiple values. If a table contains a composite or multi-valued attribute, it violates the First Normal Form. In the above table, we can clearly see that the Phone Number column has two values. Thus it violated the 1st NF.

MySQL 5.0 Certification Study Guide, Normalizing your tables removes redundant data, makes it possible to access data more groups within rows and then removes duplicate data within columns. The find duplicate values in on one column of a table, you use follow these steps: First, use the GROUP BY clause to group all rows by the target column, which is the column that you want to check duplicate. Then, use the COUNT () function in the HAVING clause to check if any group have more than 1 element. These groups are duplicate.

Head First PHP & MySQL: A Brain-Friendly Guide, normalizing your data Strive for a bit of normalcy The process of redesigning the database to eliminate duplicate data and break apart and connect tables in a  Build a temp table with all the denormalized data. It has vacant columns for the ids. Run the 2 queries for each column that needs normalizing. (2*30 queries in your case).

Normalizing the Table Design, Eliminate duplicate columns from the same table. Create separate tables for each group of related data and identify each row by using a unique column or set of  Normalization removes data redundancy and update, insert and delete anomalies and gives you a normalized perfect database design that a database administrator love. To normalize a database table, follow the below given steps that highlights the role of normalization forms and its uses −

Comments
  • You could just run a query for every color you remove. UPDATE fruits SET cid='1' WHERE cid='7', then remove the c_id 7 from the table. Repeat for each color until there are no duplicates, then don't let there be duplicates again (make c_name unique). If it were me, I would automate this using PHP (my experience) or some other language, would be pretty trivial.
  • Thank you! The Update query worked just fine, but the delete query throws an error: "c" is not valid at this position, expecting : EOF, ':'
  • @Csaba: I updated the delete query, please let me know if it works better now.
  • Now the code looks valid, but when I execute the code I get: "Table 'c' is specified twice, both as a target for 'DELETE' and a separate source for data"
  • @Csaba: ok, I changed it to a JOINed query. I tested it and this seems to work fine.
  • I think you meant to join c1 and c2 on c_name
  • @Uueerdo Sure did. Fixed. Yikes.
  • If fruit.cid has a foreign key constraint with on delete cascade, that will end up wiping a lot of fruit data; if it has no constraint at all, information like a banana being yellow will need manually reproduced.