IBM i SQL

DB2 SQL DELETE DUPLICATE DATA FROM TABLE.


There are multiple ways to Delete duplicate data from A Table using SQL Query.Here we are going to talk about using Correlated subqueries to Delete duplicate data.


Let say we have a EMPLOY table as follow

id First_name Technology Salary RRN
1 Jhon AS400 70000 1
2 Sandy JAVA 75000 2
1 Ted SQL 60000 3
3 Sumit AS400 65000 4
4 Andy PHP 80000 5
1 Ted CMS 76000 6
2 Sandy HTML 50000 7
4 Lilli DOTNET 70000 8
1 Ron JAVA 80000 9
1 David DBA 88000 10

The in above EMPLOYEE table multiple records are duplicate. To remove all duplicate records, we can write very simple correlated subquery.

Delete from EMPLOYEE E1 
where RRN(E1) > (Select MIN(RRN(E2)) from EMPLOYEE E2 where E2.id = E1.id)

To delete duplicate records, we need to find a unique value for each record of table so we are using SQL RRN() function to get relative record number for each record.

To complete the execution of subquery “Select MIN(RRN(E2)) from EMPLOYEE E2 where E2.id = E1.idSQL need the value of “E1.id” from outer query. So SQL will read the first record (RRN = 1) for outer query and get the value of E1.id = 1 and pass this value to subquery. Now the subquery will be executed as

Select MIN(RRN(E2)) from EMPLOYEE E2 where E2.id = 1 

If RRN found in outer query is bigger than RRN found in subquery than SQL will delete that record from EMPLOYEE table, If RRN found in outer query is smaller or equal to RRN found in subquery than the system will skip to next record of outer query. This process will go on for each and every record of outer query. At last only one record for each id with minimum RRN will be remaining in table.

So Final output will be

id First_name Technology Salary RRN
1 Jhon AS400 70000 1
2 Sandy JAVA 75000 2
3 Sumit AS400 65000 4
4 Andy PHP 80000 5

View Comments