![]() | |
![]() |
| | Thread Tools | Display Modes |
#1
| |||
| |||
|
#2
| |||
| |||
|
|
Data Cleansing: In the example (SQL Server DDL below) there are two tables - ExampleCustomer, and ExampleCar. ExampleCar is a lookup table. The ExampleCustomer table has a foreign key to the ExampleCar table. There should be two rows in there, "Ford Focus", and "Ford Galaxy". This table populates a drop-down list in the application, ordered by CarID. So a user is adding a new Customer record, selects a car from the list, and hey Presto! Someone came along and messed with the data, so now there are two rows for each car. Yes, I know I should have set the CarType column unique, but I didn't. The user has done what users always do - work around the problem. So she has added some customers using the first occurrence of the car in the drop-down list, except for the last record when she unaccountably selected the second occurrence. I have been tasked with cleaning up the data. The object is to delete all duplicate rows, but without leaving any orphaned references. At the end of the exercise the rows in the tables should be: CarID CarType 1 Ford Focus 3 Ford Galaxy PersonName CarID Adam Smith 1 Ben Smith 1 Colin Smith 3 Dave Smith 3 Edward Smith 3 Fred Smith 3 Note that the CarID for Fred Smith has been updated to the predominating row for the Ford Galaxy type. Is this clear? Can anyone see a way of doing this in a small number of queries? I could think of a way of doing it using cursors etc., but the prevailing wisdom is that there's ALWAYS a way to do it without using cursors. Thanks Edward DDL ========================= if exists (select * from dbo.sysobjects where id = object_id(N'[dbo]. [ExampleCar]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[ExampleCar] GO if exists (select * from dbo.sysobjects where id = object_id(N'[dbo]. [ExampleCustomer]') and OBJECTPROPERTY(id, N'IsUserTable') = 1) drop table [dbo].[ExampleCustomer] GO CREATE TABLE [dbo].[ExampleCar] ( [CarID] [int] IDENTITY (1, 1) NOT NULL , [CarType] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ) ON [PRIMARY] GO CREATE TABLE [dbo].[ExampleCustomer] ( [PersonName] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [CarID] [int] NULL ) ON [PRIMARY] GO INSERT INTO ExampleCar (CarType) VALUES ('Ford Focus') INSERT INTO ExampleCar (CarType) VALUES ('Ford Focus') INSERT INTO ExampleCar (CarType) VALUES ('Ford Galaxy') INSERT INTO ExampleCar (CarType) VALUES ('Ford Galaxy') INSERT INTO ExampleCustomer (PersonName, CarID) VALUES ('Adam Smith', 1) INSERT INTO ExampleCustomer (PersonName, CarID) VALUES ('Ben Smith', 1) INSERT INTO ExampleCustomer (PersonName, CarID) VALUES ('Colin Smith', 3) INSERT INTO ExampleCustomer (PersonName, CarID) VALUES ('Dave Smith', 3) INSERT INTO ExampleCustomer (PersonName, CarID) VALUES ('Edward Smith', 3) INSERT INTO ExampleCustomer (PersonName, CarID) VALUES ('Fred Smith', 4) |
#3
| |||
| |||
|
|
I have been tasked with cleaning up the data. The object is to delete all duplicate rows, but without leaving any orphaned references. At the end of the exercise the rows in the tables should be: CarID CarType 1 Ford Focus 3 Ford Galaxy PersonName CarID Adam Smith 1 Ben Smith 1 Colin Smith 3 Dave Smith 3 Edward Smith 3 Fred Smith 3 Note that the CarID for Fred Smith has been updated to the predominating row for the Ford Galaxy type. |
![]() |
| Thread Tools | |
| Display Modes | |
| |