dbTalk Databases Forums  

SELECT DISTINCT and compound primary key.

comp.databases.ms-sqlserver comp.databases.ms-sqlserver


Discuss SELECT DISTINCT and compound primary key. in the comp.databases.ms-sqlserver forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Richard
 
Posts: n/a

Default SELECT DISTINCT and compound primary key. - 06-07-2011 , 07:07 AM






Hello,

I have a table (Main) with compound key and second, temporary table
(Temp) with same fields but without PK. Main table key is 3 columns
( SourceFileHash, FileHash, PathHash ). There is also 4 other columns
in each table.
My intent is to use temp table to store batches of records with
duplicates and inserting only unique records to main table after every
batch is finished ( batches are inserted using SqlBulkCopy ).

How can I select distinct records on Temp table but using only 3 PK
columns ?

Here is query that I created - it won't work because it is trying to
insert duplicate records to main:

SELECT temp.sourceFileHash, temp.pathHash, temp.fileHash, temp.
[length], temp.creationDate, temp.modificationDate,
temp.lastAccessDate, temp.line
FROM CatalogEntriesTemp temp
LEFT OUTER JOIN CatalogEntries main ON main.pathHash = temp.pathHash
and main.fileHash = temp.fileHash AND main.sourceFileHash =
temp.sourceFileHash
WHERE main.pathHash IS NULL AND main.fileHash IS NULL AND
main.sourceFileHash is null

Thanks for any help in advance.

Reply With Quote
  #2  
Old   
Bob Barrows
 
Posts: n/a

Default Re: SELECT DISTINCT and compound primary key. - 06-07-2011 , 07:54 AM






Richard wrote:
Quote:
Hello,

I have a table (Main) with compound key and second, temporary table
(Temp) with same fields but without PK. Main table key is 3 columns
( SourceFileHash, FileHash, PathHash ). There is also 4 other columns
in each table.
My intent is to use temp table to store batches of records with
duplicates and inserting only unique records to main table after every
batch is finished ( batches are inserted using SqlBulkCopy ).

How can I select distinct records on Temp table but using only 3 PK
columns ?

Here is query that I created - it won't work because it is trying to
insert duplicate records to main:

SELECT temp.sourceFileHash, temp.pathHash, temp.fileHash, temp.
[length], temp.creationDate, temp.modificationDate,
temp.lastAccessDate, temp.line
FROM CatalogEntriesTemp temp
LEFT OUTER JOIN CatalogEntries main ON main.pathHash = temp.pathHash
and main.fileHash = temp.fileHash AND main.sourceFileHash =
temp.sourceFileHash
WHERE main.pathHash IS NULL AND main.fileHash IS NULL AND
main.sourceFileHash is null

Thanks for any help in advance.
You need to create a CTE or view that uses GROUP BY to select the unique
rows from CatalogEntriesTemp, aggregating the non-key columns so that a
single row is retrieved.

Reply With Quote
  #3  
Old   
Fred.
 
Posts: n/a

Default Re: SELECT DISTINCT and compound primary key. - 06-07-2011 , 08:13 AM



On Jun 7, 8:07*am, Richard <richarde... (AT) gmail (DOT) com> wrote:
Quote:
Hello,

I have a table (Main) with compound key and second, temporary table
(Temp) with same fields but without PK. Main table key is 3 columns
( SourceFileHash, FileHash, PathHash ). There is also 4 other columns
in each table.
My intent is to use temp table to store batches of records with
duplicates and inserting only unique records to main table after every
batch is finished ( batches are inserted using SqlBulkCopy ).

How can I select distinct records on Temp table but using only 3 PK
columns ?

Here is query that I created - it won't work because it is trying to
insert duplicate records to main:

SELECT temp.sourceFileHash, temp.pathHash, temp.fileHash, temp.
[length], temp.creationDate, temp.modificationDate,
temp.lastAccessDate, temp.line
* * * * * * * * FROM CatalogEntriesTemp temp
* * * * * * * * LEFT OUTER JOIN CatalogEntries main ON main.pathHash = temp.pathHash
and main.fileHash = temp.fileHash AND main.sourceFileHash =
temp.sourceFileHash
* * * * * * * * WHERE main.pathHash IS NULL AND main.fileHash IS NULL AND
main.sourceFileHash is null

Thanks for any help in advance.
It looks to me as if you need to look at your application and
assumptions about your data more closely.

If the key fields actually are hash values, the key duplication could
be coming up because of unresolved hash colisions, So you need first
to make sure that hash the values actually are getting rehashed when
necessary to point to the correct entries.

If the hashing is correct, then, as is likely, your data aquisition
proceess has records which match on the three key fields but have
different values in the other four fields, then before you can get
down to distinct records you need to establish which values for these
other four fields are to be treated as correct in your application.
Once you have completed your application definition you will be in a
position to work out how to write the query.

Fred.

Reply With Quote
  #4  
Old   
Erland Sommarskog
 
Posts: n/a

Default Re: SELECT DISTINCT and compound primary key. - 06-07-2011 , 04:37 PM



Richard (richardemig (AT) gmail (DOT) com) writes:
Quote:
I have a table (Main) with compound key and second, temporary table
(Temp) with same fields but without PK. Main table key is 3 columns
( SourceFileHash, FileHash, PathHash ). There is also 4 other columns
in each table.
My intent is to use temp table to store batches of records with
duplicates and inserting only unique records to main table after every
batch is finished ( batches are inserted using SqlBulkCopy ).

How can I select distinct records on Temp table but using only 3 PK
columns ?

Here is query that I created - it won't work because it is trying to
insert duplicate records to main:
WITH numbered AS (
SELECT sourceFileHash, pathHash, fileHash, length, creationDate,
modificationDate, lastAccessDate, line,
row_number() OVER
(PARTITION BY sourceFileHash, pathHash, fileHash
ORDER BY ?????) as rowno
FROM CatalogEntriesTemp
)
SELECT sourceFileHash, pathHash, fileHash, length, creationDate,
modificationDate, lastAccessDate, line
FROM numbered
WHERE rowno = 1

You need to fill in the ORDER BY clause to determine which of the
rows with the same duplicate key you want.




--
Erland Sommarskog, SQL Server MVP, esquel (AT) sommarskog (DOT) se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

Reply With Quote
  #5  
Old   
Richard
 
Posts: n/a

Default Re: SELECT DISTINCT and compound primary key. - 06-10-2011 , 09:50 AM



Quote:
You need to fill in the ORDER BY clause to determine which of the
rows with the same duplicate key you want.
Thanks Erland.

Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.