dbTalk Databases Forums  

Duplicate vs. Original, Calculation question

comp.databases.filemaker comp.databases.filemaker


Discuss Duplicate vs. Original, Calculation question in the comp.databases.filemaker forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
Teresa K.
 
Posts: n/a

Default Duplicate vs. Original, Calculation question - 10-21-2005 , 12:47 PM






Windows XP
Filemaker Pro: 6.0x

I've read most of the issues dealing with Duplicates, deleting them,
identifying them, etc...

I currently have in place the fields and self-relationships to identify
duplicate database entries per instructions I read on-line, however,
how do I set things up to identify the first instance of a file as an
Original, and all subsequent files that are duplicates as Duplicates in
my Duplicate Results.

I have a database with over 6100 records. This is a child database
that holds detailed information on documents that apply to Worksheets.


Worksheet A has Document_111, Document_123 and Document_222 assigned to
it, which are the original documents. However, Document_111 is showing
up 10 times in my ACTUAL DOC AFFECTED database because of a user entry
mix-up, and thus showing that many repeats in my portal on Worksheet A.

Same thing with Worksheet B. It has Document_333, Document_345, and
Document_444 assigned to it, which are the original documents. However,
Document_345 is showing up 14 times in my ACTUAL DOC AFFECTED database
because of a user entry mix-up, and thus showing that many repeats in
it's portal of Documents affected.

I'm in need a calculation that will mark the first instance of the
document key that comes up as "Original", and any others that come up
as Duplicate. Would this be done as a script instead of calculated
fields?

Here are the calculation fields I have in place:

Duplicate = Unique Text calculation
Duplicate Count = Count(Actual Doc Affected:uplicate)
Duplicate Result = If (Duplicate Count > 1,
"Duplicate", "Original)

Here is my relationship I have in place:

Duplicate=:uplicate (Related file being Actual Doc Affected)

In my example above with the calculations & relationships I'm using,
the field "Duplicate Result" is show all Documents_111 as duplicate,
including the first instance. The same for Document_345.


Reply With Quote
  #2  
Old   
Teresa K.
 
Posts: n/a

Default Re: Duplicate vs. Original, Calculation question - 10-21-2005 , 01:19 PM






You know, once I wrote all this stuff, I would finally find the key
words that would help me solve this. Just for the record, in case
anyone else has this problem and is searching for help, these are the
steps I used to remove duplicate entries.

1) You still need a Duplicate_Key field that is a Calculation (in my
case, it was based on a text calculation)that uniquely identifies a
record based on general critiera.

Example
DUPLICATE = "DOC_NAME & DOC_PARAGRAPH & DOC_SIZE & DOC_TYPE"

2) Sort your database that contains the duplicate records.

3) Export the database, however, select the "Summarize Button" first.
Then, in the "Summarize By" dialog box, select the field you want to
use as the break field. A checkmark appears to the left of the field.

The Summarize Button displays fields that the file is currently sorted
by. It will export one record for each unique value in this field,
disgarding the duplicates based on your key.

That's it, unless someone has anything else to add.

Thanks for letting me share and working this out.

- Teresa K.


Reply With Quote
  #3  
Old   
42
 
Posts: n/a

Default Re: Duplicate vs. Original, Calculation question - 10-21-2005 , 01:22 PM



In article <1129916820.900778.169720 (AT) g49g2000cwa (DOT) googlegroups.com>,
teresa.kabourek (AT) ngc (DOT) com says...
Quote:
Windows XP
Filemaker Pro: 6.0x

I've read most of the issues dealing with Duplicates, deleting them,
identifying them, etc...

I currently have in place the fields and self-relationships to identify
duplicate database entries per instructions I read on-line, however,
how do I set things up to identify the first instance of a file as an
Original, and all subsequent files that are duplicates as Duplicates in
my Duplicate Results.

I have a database with over 6100 records. This is a child database
that holds detailed information on documents that apply to Worksheets.


Worksheet A has Document_111, Document_123 and Document_222 assigned to
it, which are the original documents. However, Document_111 is showing
up 10 times in my ACTUAL DOC AFFECTED database because of a user entry
mix-up, and thus showing that many repeats in my portal on Worksheet A.

Same thing with Worksheet B. It has Document_333, Document_345, and
Document_444 assigned to it, which are the original documents. However,
Document_345 is showing up 14 times in my ACTUAL DOC AFFECTED database
because of a user entry mix-up, and thus showing that many repeats in
it's portal of Documents affected.

I'm in need a calculation that will mark the first instance of the
document key that comes up as "Original", and any others that come up
as Duplicate. Would this be done as a script instead of calculated
fields?

Here are the calculation fields I have in place:

Duplicate = Unique Text calculation
Duplicate Count = Count(Actual Doc Affected:uplicate)
Duplicate Result = If (Duplicate Count > 1,
"Duplicate", "Original)

Here is my relationship I have in place:

Duplicate=:uplicate (Related file being Actual Doc Affected)

In my example above with the calculations & relationships I'm using,
the field "Duplicate Result" is show all Documents_111 as duplicate,
including the first instance. The same for Document_345.
You are most of the way there.

Naturally the "Duplicate Count" calc is the same for every duplicate. If
there are 5 duplicates, each duplicate sees the other 4, so count
duplicates is always 5.

Use the fact that when accessing a one-many relationship without the use
of aggregate functions (like "count"), only the first related record is
operated on. To do this you need a truly unique field, like a serial
number, that will be different for each record, even duplicated ones.
(An autoenter serial number works great.)

if myuniqueserial = actualdocduplicated::myuniqueserial then you are on
the 'first' record, otherwise you are on a duplicate.

to illustrate:

take a simple table:

{serial, id}

We want to ensure that each "id" only occurs once... so we:

define a relationship from id to itself called: detectdupes

and lets make some records:
{serial, id}
123, XYZ
124, XYZ
125, XYZ
126, ZZZ
127, YYY
128, ZZZ

define countdupes as a calc = count(detectdupes::id);
for each of these: countdupes will evaluate:
{serial, id, countdupes}

123, XYZ, 3
124, XYZ, 3
125, XYZ, 3
126, ZZZ, 2
127, YYY, 1
128, ZZZ, 2

which tells each record the number of duplicates but not which one is
the "original" (or first).

but what we really need is:
define status as a calc = if detectdupes::serial = serial, "Original",
"Duplicate")

{serial, id, countdupes, status} // coment

123, XYZ, 3, original
124, XYZ, 3, duplicate
125, XYZ, 3, duplicate
126, ZZZ, 2, original
127, YYY, 1, original
128, ZZZ, 2, duplicate

-----------

When the 123 record looks through the detectduplicates portal it sees:
recrods {123, 124, 125}, but detectduplicates:id returns 123 so it knows
the first related record is itself.

And of course, when 124 record looks the the detectduplicates portal it
also sees records: {123, 124, 125}, and detectduplicates:id still
returns 123, so it knows that the first related record is not itself.

HTH,
Dave


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.