dbTalk Databases Forums  

How can I remove duplicate entries in a sql query?

comp.databases.ms-sqlserver comp.databases.ms-sqlserver


Discuss How can I remove duplicate entries in a sql query? in the comp.databases.ms-sqlserver forum.



Reply
 
Thread Tools Display Modes
  #11  
Old   
--CELKO--
 
Posts: n/a

Default Re: How can I remove duplicate entries in a sql query? - 03-05-2006 , 03:43 PM






Have you looked at Kx and Stonebreaker's StreamBase? They are databaes
designed for a high te,mproal flow of data. SQL is not always thje
right tool, in spite of what Bill Gates tells you.


Reply With Quote
  #12  
Old   
Steven Stern
 
Posts: n/a

Default Re: How can I remove duplicate entries in a sql query? - 03-05-2006 , 09:01 PM






On Sat, 04 Mar 2006 11:01:08 -0800, BarrySDCA wrote:

Quote:
It's exactly what I'm doing.....I'm recording hits to an advertising
engine, before they get to the media server. Some client players hit
more than once before they connect, causing a duplicate entry. I just
want to see it sorted out w/out the duplicates. I have the
distribution server logs for accurate playback reporting, I'm only
interested in seeing cleaned up hits.

If everything is the same except the timestamp, you can round that up to
the nearest second and use a SELECT DISTINCT.

Check your SQL server's date conversion options to find out how to get
different values from the timestamp field.



Reply With Quote
  #13  
Old   
Ed Prochak
 
Posts: n/a

Default Re: How can I remove duplicate entries in a sql query? - 03-07-2006 , 11:25 AM




Doug wrote:
Quote:
(but the ID field is still a poor crutch for a possibly weak design.)

hmmmm. i'd prefer to think of it as the table stores ALL hits to the
table, but this particular data consumer only wants to see ONE.
It isn't clear that this was ONE of several data consumers.. Even so
the ID pseudokey might not be the best choice.

Quote:
The
way I think about it, your natural key would be the composite of all
fields in the table; perhaps "natural", but certainly awkword.

I like figital's solution.

It is not clear from the previous posts that the natural key would
require ALL the columns of that table. If the table is really storing
events, then the key might be as simple as a timestamp, perhaps with an
event type. It doesn't have to be ALL of the columns.

My point is too many people immediately jump to ID fields as the PK.
Many times before they even sit down to think about what entity the
table represents. If you never look for the natural key, you are sure
not to find it.

HTH
ed



Reply With Quote
  #14  
Old   
Bill Karwin
 
Posts: n/a

Default Re: How can I remove duplicate entries in a sql query? - 03-07-2006 , 11:51 AM



"Ed Prochak" <edprochak (AT) gmail (DOT) com> wrote

Quote:
My point is too many people immediately jump to ID fields as the PK.
That's certainly a good point. However, I've worked on projects in which
the decision-makers wouldn't commit to _any_ combination of attributes that
would uniquely identify the entity. There were always cases where the value
in any column could be either non-unique, or else have no value specified
(i.e. NULL). Neither would they commit to any attributes that could be
reasonably stable and unchanging (though I understand that this is not
strictly necessary for a key).

So in those kinds of situations, I felt I had to create pseudokeys to have
any chance of the application working. Even if we know the best practices
for database modeling, the project on which we are working may have
constraints that don't allow us to follow those best practices.

Regards,
Bill K.




Reply With Quote
  #15  
Old   
Ed Prochak
 
Posts: n/a

Default Re: How can I remove duplicate entries in a sql query? - 03-07-2006 , 12:40 PM




Bill Karwin wrote:
Quote:
"Ed Prochak" <edprochak (AT) gmail (DOT) com> wrote in message
news:1141752316.933161.102260 (AT) i39g2000cwa (DOT) googlegroups.com...
My point is too many people immediately jump to ID fields as the PK.

That's certainly a good point. However, I've worked on projects in which
the decision-makers wouldn't commit to _any_ combination of attributes that
would uniquely identify the entity. There were always cases where the value
in any column could be either non-unique, or else have no value specified
(i.e. NULL). Neither would they commit to any attributes that could be
reasonably stable and unchanging (though I understand that this is not
strictly necessary for a key).
It's not necessarily something the End users or even managers should
make the final decision upon. I one really big project I had to good
fortune of working with a real Data Modeler. No she did not
programming, only modelling. She interviewed the users to find out what
data they had, what data they wanted and how they used it. She
organized it and created the ER model. The model included enough
information to design the DB model, which is when the PK's were
determined. Almost all of the cases you mentioned were encountered.
Each was handled by the DB designers. There are times when you create a
pseudokey and times when you split the logical table
maybe to model a more generic entity you create a pair of
parent/child tables where the parent lacks one of the fields that might
be NULL if only the Child table existed, or
maybe the entity really was 2 entities that looked similar, as an
address that represents the customer's location and the adddress that
represents the billing address. A DB that application that needed both
would be poorly served by one ADDRESS table.
What happens internally to the DB does not matter as long as you can
present the right data in the right combinations.

It is our job to engineer the software, not the end users and managers.

Quote:
So in those kinds of situations, I felt I had to create pseudokeys to have
any chance of the application working. Even if we know the best practices
for database modeling, the project on which we are working may have
constraints that don't allow us to follow those best practices.

Regards,
Bill K.
Sounds like you take pseudokeys as the last resort, which IMO is
exactly what they are. That approach is best practices. Keep it up.

Ed



Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.