dbTalk Databases Forums  

Removing Duplicates In Filemaker The Easy Way

comp.databases.filemaker comp.databases.filemaker


Discuss Removing Duplicates In Filemaker The Easy Way in the comp.databases.filemaker forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
squeed2000@yahoo.com
 
Posts: n/a

Default Removing Duplicates In Filemaker The Easy Way - 06-15-2007 , 09:23 PM






If you need to be able to identify duplicate records in your
Filemaker
database, read on, here is a simple way to accomplish this with 5
minutes worth of work and all done within a script.
One of the things that Filemaker Pro (all versions) is missing is a
good duplicate record identification and removal/merger feature.
I've seen a lot of posts here on how to identify and remove
duplicates
but they all seem very complicated and involve linking and alternate
tables.

I've figured out a way to create a simple duplicate tagging script
that you can use to identify duplcates in your Filemaker database.

For this example, the database is called "DATA.fp7" and the script is
called "Flag Dupes". This script determines what records are
duplicate records based on records having an identical address.
Again, this is a simple example, if you wanted to identify duplicate
records based on more than just the value of one field, you could
modify this method to use multiple fields, very easily.

The database that I am using has the following fields :

Contact Name, Address, City, State, ZIP and two fields that I created
for the purpose of identifying duplicates. Those fields are
"Duplicate" and "global".

The global field needs to have a storage option of global. i.e. the
same value for every record in the database.

The script is below. What isn't shown is what the sort option is.
The sort option is "Address" because address is the field that I am
searching on for duplicates.

The result of running this script will be that all records that are
duplicates except for one unique of the duplicates will be marked
"Dupe". After the script has been run, you need to do a search on
"Dupe" and delete those found records.

Here is how it works :

The script starts by showing all records in the database, then sorts
the records by address. It then begins to look through all of the
addresses and where the address of a record equals the address of the
prior record, that record gets flagged as a "Dupe". That's it.

If you try this and you like it, let me know, post back here, I'm
interested in your comments or suggestions.

Show All Records
Sort Records [Restore; No dialog]
Go to Record/Request/Page [First]
Copy [Select; DATA::Address]
Paste [Select; DATA::global]
Loop
Go to Record/Request/Page [Next; Exit after last]
If [DATA::Address = DATA::global]
Set Field [DATA:uplicate; "Dupe"]
End If
Copy [Select; DATA:Address]
Paste [Select; DATA::global]
End Loop

-Squeed


Reply With Quote
  #2  
Old   
Chris Brown
 
Posts: n/a

Default Re: Removing Duplicates In Filemaker The Easy Way - 06-17-2007 , 06:41 PM






squeed2000 (AT) yahoo (DOT) com wrote:
agged as a "Dupe". That's it.
Quote:
If you try this and you like it, let me know, post back here, I'm
interested in your comments or suggestions.

Show All Records
Sort Records [Restore; No dialog]
Go to Record/Request/Page [First]
Copy [Select; DATA::Address]
Paste [Select; DATA::global]
Loop
Go to Record/Request/Page [Next; Exit after last]
If [DATA::Address = DATA::global]
Set Field [DATA:uplicate; "Dupe"]
End If
Copy [Select; DATA:Address]
Paste [Select; DATA::global]
End Loop

-Squeed


for this kind of thing (matching) I use ( in a loop)
set $$_A = Fa
set $$_B = Fb

if $$_A = $$_B
set tag = 1


I also use this in a Comparator db for comparing table filed names ,
script code... and extract a single field of the complied differences.
THis example compares two text fields, such as
TableNames ( g_FileNameA)
TableNames ( g_FileNameB)

extract differences CF
Set Variable [ $$_L; Value:Let([
LA= PatternCount ( CF::CF1; "¶" );
LB = PatternCount ( CF::CF2; "¶" )
];
If( LA ≥ LB; LA; LB)
) ]
#
Allow User Abort [ On ]
Set Variable [ $$_n; Value:1 ]
Loop
Commit Records/Requests
Set Variable [ $$_A; Value:Trim(GetValue( CF::CF1 ; $$_n )) ]
Set Variable [ $$_B; Value:Trim(GetValue( CF::CF2 ; $$_n )) ]
If [ $$_A ≠ $$_B ]
Set Field [ CF:ifferences; CF:ifferences & "¶¶" & $$_A & "¶" &
$$_B ]
End If
Set Variable [ $$_n; Value:$$_n +1 ]
Exit Loop If [ $$_n = $$_L +3 ]
Commit Records/Requests
End Loop






The second script is more entertaining and parses the field lines of two
fields containing lists, to new related records. In the related table.
calcs colour flag mismatches, and the resultant two portals , allow A:B
and B:A differences to be illustrated (necessary when two fields may
contain the same actual content, but in different line order.
the initial g_Type is just a switch to allow the script to determine,
via script parameter, what layout it is on and hence what field pair to
compare



parse
Commit Records/Requests
[ No dialog ]
Set Field [ User::g_Type; Get ( ScriptParameter ) ]
Set Variable [ $$_F1; Value:Case(
User::g_Type = "TBL"; User::c_TBLNamesA;
User::g_Type = "TOC"; User::c_TOCNamesA;
User::g_Type = "LAYOUTS"; User::c_LayoutNamesA;
User::g_Type = "Fields"; User::c_LayoutFieldsA;
User::g_Type = "SCRIPTS"; User::c_ScriptNamesA;
User::g_Type = "RELS"; User::c_RelsA;
) ]
Set Variable [ $$_F2; Value:Case(
User::g_Type = "TBL"; User::c_TBLNamesB;
User::g_Type = "TOC"; User::c_TOCNamesB;
User::g_Type = "LAYOUTS"; User::c_LayoutNamesB;
User::g_Type = "Fields"; User::c_LayoutFieldsB;
User::g_Type = "SCRIPTS"; User::c_ScriptNamesB;
User::g_Type = "RELS"; User::c_RelsB;
) ]
Go to Layout [ “Diff” (Diff) ]
Show All Records
Delete All Records
[ No dialog ]
Go to Layout [ original layout ]
#
Allow User Abort [ On ]
Set Variable [ $$_n; Value:1 ]
Loop
Set Field [ User::g_n; $$_n ]
Commit Records/Requests
Set Field [ Diff_A new::A; GetValue( $$_F1 ; $$_n ) ]
Set Field [ Diff_B new::B; GetValue( $$_F2 ; $$_n ) ]
Set Variable [ $$_n; Value:$$_n +1 ]
Exit Loop If [ $$_n = PatternCount ( $$_F1 ; "¶" ) + 10 ]
End Loop



regards

Chris


Reply With Quote
  #3  
Old   
Matt WIlls
 
Posts: n/a

Default Re: Removing Duplicates In Filemaker The Easy Way - 06-18-2007 , 02:35 PM



In article
<1181960625.962979.327180 (AT) q69g2000hsb (DOT) googlegroups.com>squeed2000@yahoo
.com wrote:

Quote:
If you need to be able to identify duplicate records in your
Filemaker
database, read on, here is a simple way to accomplish this with
5minutes worth of work and all done within a script.
One of the things that Filemaker Pro (all versions) is missing is
agood duplicate record identification and removal/merger feature.
I've seen a lot of posts here on how to identify and remove
duplicates
but they all seem very complicated and involve linking and alternate
tables.

I've figured out a way to create a simple duplicate tagging
scriptthat you can use to identify duplcates in your Filemaker
database.

For this example, the database is called "DATA.fp7" and the script
is called "Flag Dupes". This script determines what records
areduplicate records based on records having an identical address.
Again, this is a simple example, if you wanted to identify duplicate
records based on more than just the value of one field, you
couldmodify this method to use multiple fields, very easily.

The database that I am using has the following fields :

Contact Name, Address, City, State, ZIP and two fields that I
created for the purpose of identifying duplicates. Those fields
are"Duplicate" and "global".

The global field needs to have a storage option of global. i.e.
thesame value for every record in the database.

The script is below. What isn't shown is what the sort option is.
The sort option is "Address" because address is the field that I
amsearching on for duplicates.

The result of running this script will be that all records that are
duplicates except for one unique of the duplicates will be marked
"Dupe". After the script has been run, you need to do a search
on"Dupe" and delete those found records.

Here is how it works :

The script starts by showing all records in the database, then sorts
the records by address. It then begins to look through all of the
addresses and where the address of a record equals the address of
theprior record, that record gets flagged as a "Dupe". That's it.

If you try this and you like it, let me know, post back here,
I'minterested in your comments or suggestions.

Show All Records
Sort Records [Restore; No dialog]
Go to Record/Request/Page [First]
Copy [Select; DATA::Address]
Paste [Select; DATA::global]
Loop
Go to Record/Request/Page [Next; Exit after last]
If [DATA::Address
ATA::global]
Set Field [DATA:uplicate; "Dupe"]
End If
Copy [Select; DATA:Address]
Paste [Select; DATA::global]
End Loop

-Squeed

It really isn't necessary to sort and loop, so as to compare one
record with the prior record. Please bear with me: it takes longer to
explain the logic than to write the script itself:
1. Find all of the duplicate records based on whatever field or
combination of fields you choose, as long as you're comfortable with
the criteria that makes the record a duplicate. This is done simply by
entering an exclamation point (!) in the field in Find Mode.
2. Trap for nothing in the found set.

3. Delete a record. Any record, it makes absolutely no difference
which one. If it is a duplicate according to Step 1, you already know
you don't want half of this pair. You'll still have another.
4. Do it again. Since we eliminated one record from a duplicate pair,
the remaining record is no longer a duplicate, and will not be in the
new found set. If there were more than two duplicates, the pair will
be part of the found set, and one will be deleted. If no duplicates
remain, the script stops.
Thus, the entire process can be reduced to 6 steps. Let's call this
script "Delete Dupes):
Perform Find [ Specified Find Requests: Find Records; Criteria:
YourTable::YourField: ?!? ] [ Restore ]
If [ Get ( FoundCount ) = 0 ]

Exit Script [ ]

End If

Delete Record/Request [ No dialog ]

Perform Script [ ?Delete Dupes? ]

If you prefer to execute the script just once, you can delete the last
step and enclose the rest in a Loop. In my opinion, that doesn't serve
much more purpose than to add another step.
This obviously does not provide for a confirming examination of marked
records after identifying them as duplicates. It does not allow for
variations (e.g., misspellings or an abbreviation versus a spelled-out
word), but then, neither does your script. It is workable (and
recommended) only if you are 100% sure that there should not be more
than one record in which the same information populates a particular
field.
I'm sure there are some who would consider this just a tad too "quick
and dirty," but it works just fine for my purposes, and I doubt you'll
find anything simpler. I am open to whatever criticism.
Matt









Reply With Quote
  #4  
Old   
Helpful Harry
 
Posts: n/a

Default Re: Removing Duplicates In Filemaker The Easy Way - 06-18-2007 , 04:15 PM



In article <nemoMon061807023856 (AT) news (DOT) verizon.net>, Matt WIlls
<Im (AT) Witz (DOT) End> wrote:

Quote:
In article
1181960625.962979.327180 (AT) q69g20...oglegroups.com>squeed2000@yahoo
.com wrote:

If you need to be able to identify duplicate records in your
Filemaker
database, read on, here is a simple way to accomplish this with
5minutes worth of work and all done within a script.
One of the things that Filemaker Pro (all versions) is missing is
agood duplicate record identification and removal/merger feature.
I've seen a lot of posts here on how to identify and remove
duplicates
but they all seem very complicated and involve linking and alternate
tables.

I've figured out a way to create a simple duplicate tagging
scriptthat you can use to identify duplcates in your Filemaker
database.

For this example, the database is called "DATA.fp7" and the script
is called "Flag Dupes". This script determines what records
areduplicate records based on records having an identical address.
Again, this is a simple example, if you wanted to identify duplicate
records based on more than just the value of one field, you
couldmodify this method to use multiple fields, very easily.

The database that I am using has the following fields :

Contact Name, Address, City, State, ZIP and two fields that I
created for the purpose of identifying duplicates. Those fields
are"Duplicate" and "global".

The global field needs to have a storage option of global. i.e.
thesame value for every record in the database.

The script is below. What isn't shown is what the sort option is.
The sort option is "Address" because address is the field that I
amsearching on for duplicates.

The result of running this script will be that all records that are
duplicates except for one unique of the duplicates will be marked
"Dupe". After the script has been run, you need to do a search
on"Dupe" and delete those found records.

Here is how it works :

The script starts by showing all records in the database, then sorts
the records by address. It then begins to look through all of the
addresses and where the address of a record equals the address of
theprior record, that record gets flagged as a "Dupe". That's it.

If you try this and you like it, let me know, post back here,
I'minterested in your comments or suggestions.

Show All Records
Sort Records [Restore; No dialog]
Go to Record/Request/Page [First]
Copy [Select; DATA::Address]
Paste [Select; DATA::global]
Loop
Go to Record/Request/Page [Next; Exit after last]
If [DATA::Address
ATA::global]
Set Field [DATA:uplicate; "Dupe"]
End If
Copy [Select; DATA:Address]
Paste [Select; DATA::global]
End Loop

-Squeed


It really isn't necessary to sort and loop, so as to compare one
record with the prior record. Please bear with me: it takes longer to
explain the logic than to write the script itself:
1. Find all of the duplicate records based on whatever field or
combination of fields you choose, as long as you're comfortable with
the criteria that makes the record a duplicate. This is done simply by
entering an exclamation point (!) in the field in Find Mode.
2. Trap for nothing in the found set.

3. Delete a record. Any record, it makes absolutely no difference
which one. If it is a duplicate according to Step 1, you already know
you don't want half of this pair. You'll still have another.
4. Do it again. Since we eliminated one record from a duplicate pair,
the remaining record is no longer a duplicate, and will not be in the
new found set. If there were more than two duplicates, the pair will
be part of the found set, and one will be deleted. If no duplicates
remain, the script stops.
Thus, the entire process can be reduced to 6 steps. Let's call this
script "Delete Dupes):
Perform Find [ Specified Find Requests: Find Records; Criteria:
YourTable::YourField: ?!? ] [ Restore ]
If [ Get ( FoundCount ) = 0 ]

Exit Script [ ]

End If

Delete Record/Request [ No dialog ]

Perform Script [ ?Delete Dupes? ]

If you prefer to execute the script just once, you can delete the last
step and enclose the rest in a Loop. In my opinion, that doesn't serve
much more purpose than to add another step.
This obviously does not provide for a confirming examination of marked
records after identifying them as duplicates. It does not allow for
variations (e.g., misspellings or an abbreviation versus a spelled-out
word), but then, neither does your script. It is workable (and
recommended) only if you are 100% sure that there should not be more
than one record in which the same information populates a particular
field.
I'm sure there are some who would consider this just a tad too "quick
and dirty," but it works just fine for my purposes, and I doubt you'll
find anything simpler. I am open to whatever criticism.
Matt

The problem here is that you are deleting records, and the computer can
never tell that they are definitely duplicates (depending on your data
of course). You really need to mark them as "duplicate" and then have a
human being go through making sure they are duplicates ... a long
boring process, but then all data entry work is. (


Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)


Reply With Quote
  #5  
Old   
d-42
 
Posts: n/a

Default Re: Removing Duplicates In Filemaker The Easy Way - 06-19-2007 , 04:17 PM



On Jun 18, 2:15 pm, Helpful Harry <helpful_ha... (AT) nom (DOT) de.plume.com>
wrote:
Quote:
In article <nemoMon061807023... (AT) news (DOT) verizon.net>, Matt WIlls



I... (AT) Witz (DOT) End> wrote:
In article
1181960625.962979.327... (AT) q69g2000hsb (DOT) googlegroups.com>squeed2000@yahoo
.com wrote:

If you need to be able to identify duplicate records in your
Filemaker
database, read on, here is a simple way to accomplish this with
5minutes worth of work and all done within a script.
One of the things that Filemaker Pro (all versions) is missing is
agood duplicate record identification and removal/merger feature.
I've seen a lot of posts here on how to identify and remove
duplicates
but they all seem very complicated and involve linking and alternate
tables.

I've figured out a way to create a simple duplicate tagging
scriptthat you can use to identify duplcates in your Filemaker
database.

For this example, the database is called "DATA.fp7" and the script
is called "Flag Dupes". This script determines what records
areduplicate records based on records having an identical address.
Again, this is a simple example, if you wanted to identify duplicate
records based on more than just the value of one field, you
couldmodify this method to use multiple fields, very easily.

The database that I am using has the following fields :

Contact Name, Address, City, State, ZIP and two fields that I
created for the purpose of identifying duplicates. Those fields
are"Duplicate" and "global".

The global field needs to have a storage option of global. i.e.
thesame value for every record in the database.

The script is below. What isn't shown is what the sort option is.
The sort option is "Address" because address is the field that I
amsearching on for duplicates.

The result of running this script will be that all records that are
duplicates except for one unique of the duplicates will be marked
"Dupe". After the script has been run, you need to do a search
on"Dupe" and delete those found records.

Here is how it works :

The script starts by showing all records in the database, then sorts
the records by address. It then begins to look through all of the
addresses and where the address of a record equals the address of
theprior record, that record gets flagged as a "Dupe". That's it.

If you try this and you like it, let me know, post back here,
I'minterested in your comments or suggestions.

Show All Records
Sort Records [Restore; No dialog]
Go to Record/Request/Page [First]
Copy [Select; DATA::Address]
Paste [Select; DATA::global]
Loop
Go to Record/Request/Page [Next; Exit after last]
If [DATA::Address
ATA::global]
Set Field [DATA:uplicate; "Dupe"]
End If
Copy [Select; DATA:Address]
Paste [Select; DATA::global]
End Loop

-Squeed

It really isn't necessary to sort and loop, so as to compare one
record with the prior record. Please bear with me: it takes longer to
explain the logic than to write the script itself:
1. Find all of the duplicate records based on whatever field or
combination of fields you choose, as long as you're comfortable with
the criteria that makes the record a duplicate. This is done simply by
entering an exclamation point (!) in the field in Find Mode.
2. Trap for nothing in the found set.

3. Delete a record. Any record, it makes absolutely no difference
which one. If it is a duplicate according to Step 1, you already know
you don't want half of this pair. You'll still have another.
4. Do it again. Since we eliminated one record from a duplicate pair,
the remaining record is no longer a duplicate, and will not be in the
new found set. If there were more than two duplicates, the pair will
be part of the found set, and one will be deleted. If no duplicates
remain, the script stops.
Thus, the entire process can be reduced to 6 steps. Let's call this
script "Delete Dupes):
Perform Find [ Specified Find Requests: Find Records; Criteria:
YourTable::YourField: ?!? ] [ Restore ]
If [ Get ( FoundCount ) = 0 ]

Exit Script [ ]

End If

Delete Record/Request [ No dialog ]

Perform Script [ ?Delete Dupes? ]

If you prefer to execute the script just once, you can delete the last
step and enclose the rest in a Loop. In my opinion, that doesn't serve
much more purpose than to add another step.
This obviously does not provide for a confirming examination of marked
records after identifying them as duplicates. It does not allow for
variations (e.g., misspellings or an abbreviation versus a spelled-out
word), but then, neither does your script. It is workable (and
recommended) only if you are 100% sure that there should not be more
than one record in which the same information populates a particular
field.
I'm sure there are some who would consider this just a tad too "quick
and dirty," but it works just fine for my purposes, and I doubt you'll
find anything simpler. I am open to whatever criticism.
Matt

The problem here is that you are deleting records, and the computer can
never tell that they are definitely duplicates (depending on your data
of course). You really need to mark them as "duplicate" and then have a
human being go through making sure they are duplicates ... a long
boring process, but then all data entry work is. (
I think often, in practice, we *can* rely on the detected duplicates
to be duplicate, and not go through them manually. If the computer
finds 50,000 duplicates in a database of 1,000,000 nobody is going to
manually check anything. Either you trust the machine and delete
without checking and accept the risk, or you just don't remove
duplicates.

That said, you can pretty easily augment Matt's technique to flag
records. E.g. prepend the value in the field with an xx: to flag it as
a duplicate. It will be omitted from future finds.

Of course, there are limitations and considerations to doing this, but
that's a separate issue.
-cheers,
Dave



Reply With Quote
  #6  
Old   
d-42
 
Posts: n/a

Default Re: Removing Duplicates In Filemaker The Easy Way - 06-19-2007 , 04:39 PM



On Jun 18, 12:35 pm, Matt WIlls <I... (AT) Witz (DOT) End> wrote:

Quote:
I'm sure there are some who would consider this just a tad too "quick
and dirty," but it works just fine for my purposes, and I doubt you'll
find anything simpler. I am open to whatever criticism.
Matt
Neat technique.
The performance characteristics of (find !, delete, repeat until
nothing is found) is obviously going to be determined by how fast
find ! is, and how many repetitions are necessary.

Now a basic a find x, on a single indexed field would be in O(log n)
but find ! can't 'home in' on its result and almost surely requires a
complete index traversal, so its probably O(n).

Unfortunately, although the number of duplicates shrinks with each
iteration the find! will have to do a complete index traversal each
time. And the number of repitions is determined by the number of
duplicates so this thing will perform at worst in n^2 time, but that
would only happen if every record was a duplicate, which isn't a going
to be a typical distribution.

Still the number of duplicates of an uncleaned data set is likely to
be a proportion of n, so the performance is still O(n^2).

If you aren't familiar with O notation, the upshot of all this is that
the other options that have been presented are all faster, on large
enough datasets.

Don't take that as criticism though. This technique is elegant and
simple, and for smaller data sets all solutions perform 'fast enough',
and coding simplicity / ease of maintenance trumps optimizing
performance.

-cheers,
Dave



Reply With Quote
  #7  
Old   
Scott in SoCal
 
Posts: n/a

Default Re: Removing Duplicates In Filemaker The Easy Way - 06-20-2007 , 12:09 PM



On Jun 15, 7:23 pm, squeed2... (AT) yahoo (DOT) com wrote:
Quote:
If you need to be able to identify duplicate records in your
Filemaker database, read on, here is a simple way to accomplish this [...]
That's pretty much how I do it. I've tried this and the "destructive
loop" approach that is also mentioned in this thread. I don't really
see much difference in the other method. Yours provides the
opportunity for manual review, but the other is SLIGHTLY faster to
implement.

You might also want to consider that the duplicates are being treated
as identical, but there could be other fields that don't match (say
you're matching on address, but the phone number might be more up-to-
date in another record). You might want to sort on the modification
date (descending), so the last record modified is the one you keep and
the others are marked as duplicates.

-Scott in SoCal




Reply With Quote
  #8  
Old   
Helpful Harry
 
Posts: n/a

Default Re: Removing Duplicates In Filemaker The Easy Way - 06-20-2007 , 03:49 PM



In article <1182359340.459262.31520 (AT) g37g2000prf (DOT) googlegroups.com>,
Scott in SoCal <ScottDoofus (AT) gmail (DOT) com> wrote:

Quote:
On Jun 15, 7:23 pm, squeed2... (AT) yahoo (DOT) com wrote:
If you need to be able to identify duplicate records in your
Filemaker database, read on, here is a simple way to accomplish this [...]

That's pretty much how I do it. I've tried this and the "destructive
loop" approach that is also mentioned in this thread. I don't really
see much difference in the other method. Yours provides the
opportunity for manual review, but the other is SLIGHTLY faster to
implement.

You might also want to consider that the duplicates are being treated
as identical, but there could be other fields that don't match (say
you're matching on address, but the phone number might be more up-to-
date in another record). You might want to sort on the modification
date (descending), so the last record modified is the one you keep and
the others are marked as duplicates.
Except that the last record might not be the one you want to keep - it
could have been entered with mistakes (eg. someone doingd data entry
from a list gets distracted and then enters the phone number from the
next line in the list by accident). This is why you should really
manually check all so-called duplicates.

Helpful Harry
Hopefully helping harassed humans happily handle handiwork hardships ;o)


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.