dbTalk Databases Forums  

Merging duplicate fields inside a database

comp.databases.filemaker comp.databases.filemaker


Discuss Merging duplicate fields inside a database in the comp.databases.filemaker forum.



Reply
 
Thread Tools Display Modes
  #1  
Old   
D.
 
Posts: n/a

Default Merging duplicate fields inside a database - 06-16-2005 , 02:19 PM






Using filemaker pro 5.5 for windows:
I have a database that has survey information for a study I am
conducting. Each subject has a unique ID number to identify them. The
data is contained in several filemaker databases that have been merged
into one master file. I merged into the master file by matching the ID
numbers and "updating current records in current found set". Despite
doing this about a fifth of my records have duplicate ID numbers. What
I want to do is merge records with the same ID numbers into one record.
To do this manually would be very difficult because there are so many
fields for each record.
Ideally I need a function which would work like this:

Given two below duplicate records with non-overlapping field entries:

Record: 1238
Field1: yes
Field2:
Field3:
Field4: no

Record: 1238

Field1:
Field2: $250
Field3: Male
Field4:

Merge both records into:

Record: 1238

Field1: yes
Field2: $250
Field3: Male
Field4: no

If the program could also notify me if there are overlaps between the
fields it would be great.

Does anyone have any suggestions?

Thanks,
..d


Reply With Quote
  #2  
Old   
eyebrown@mindspring.com
 
Posts: n/a

Default Re: Merging duplicate fields inside a database - 06-17-2005 , 06:32 AM






You might try splitting off the dupes into a separate (temporary)
database. Make a dupe finder script. The manual that came with FM 5.5
has a real nice one it uses as its example script in the Scriptmaker
chapter. I've adapted this to many things. Essentially, you find groups
of dupe records and flag all but one of each group.

Find the flagged records and export them someplace. Export as "Filemaker
Pro File". Delete the dupes from your main database.

Now you can import the missing data from your newly exported database.
Make your record id the match field (double arrowhead) and have it
populate the fields. Essentially an update rather than an import process.

A couple of questions:

1. Can there be more than one dupe record id? Or are there always either
one or two of them?

2. Is the non-overlapping data the same? That is, is it always Field2
and Field3 empty in one of a set of two dupes, and the reverse in the
other? Does your data look exactly like your example?

If so, there might be a simpler way, or at least a less risky way. Make
your found set, export & delete as above. Make a relationship between the
main database and your exported file on record id. Now you can fill
Field2 and Field3 from the exported file using the Replace with
Calculation function.

As ever, back up, make copies, you cannot recover from most of these operations.

Steve Brown

In article <1118949554.853924.99090 (AT) g43g2000cwa (DOT) googlegroups.com>, "D."
<dtae (AT) yahoo (DOT) com> wrote:

Quote:
Using filemaker pro 5.5 for windows:
I have a database that has survey information for a study I am
conducting. Each subject has a unique ID number to identify them. The
data is contained in several filemaker databases that have been merged
into one master file. I merged into the master file by matching the ID
numbers and "updating current records in current found set". Despite
doing this about a fifth of my records have duplicate ID numbers. What
I want to do is merge records with the same ID numbers into one record.
To do this manually would be very difficult because there are so many
fields for each record.
Ideally I need a function which would work like this:

Given two below duplicate records with non-overlapping field entries:

Record: 1238
Field1: yes
Field2:
Field3:
Field4: no

Record: 1238

Field1:
Field2: $250
Field3: Male
Field4:

Merge both records into:

Record: 1238

Field1: yes
Field2: $250
Field3: Male
Field4: no

If the program could also notify me if there are overlaps between the
fields it would be great.

Does anyone have any suggestions?

Thanks,
.d

Reply With Quote
  #3  
Old   
D.
 
Posts: n/a

Default Re: Merging duplicate fields inside a database - 06-17-2005 , 03:42 PM



Quote:
1. Can there be more than one dupe record id? Or are there always either
one or two of them?
Yes, usually there is only 1, but I have a few with 2 dupes.


Quote:
2. Is the non-overlapping data the same? That is, is it always Field2
and Field3 empty in one of a set of two dupes, and the reverse in the
other? Does your data look exactly like your example?
Not always. Part of my problem is I want to integrate about about 50
fields together. It becomes difficult to manually see where the overlap
occurs between the dupes.

Ideally what I would have happen is dupes would be compared to the
original, if the field is empy dupe value would be placed in, if not
empty it would prompt me to look at it in detail. Do you know how
filemaker handles such situations when importing? Is it creating more
dupes on these occasions?

Thanks for your help. I can tell I am at least getting on the right
track now.

..d



Reply With Quote
  #4  
Old   
eyebrown@mindspring.com
 
Posts: n/a

Default Re: Merging duplicate fields inside a database - 06-18-2005 , 07:10 AM



In article <1119040938.806279.200040 (AT) o13g2000cwo (DOT) googlegroups.com>, "D."
<dtae (AT) yahoo (DOT) com> wrote:

Quote:
Ideally what I would have happen is dupes would be compared to the
original, if the field is empy dupe value would be placed in, if not
empty it would prompt me to look at it in detail. Do you know how
filemaker handles such situations when importing? Is it creating more
dupes on these occasions?
Importing may not be the way to go for you. I suspect that if a field is
empty in the source, it will overwrite live data in the target with
"nothing".

But maybe there's another way. Using the two files, set up a relationship
between them on your ID field. Write a looping script that goes through
your primary database record by record and compares each field to the same
field in the matching record in the exported database. Something like:

Go To Record/Request/Page [First]
If IsEmpty(FieldA),
Set Field (FieldA, ExportDB::FieldA)
EndIf
If IsEmpty(FieldB),
Set Field (FieldB, ExportDB::FieldB)
EndIf
.... etc.
Loop
Go To Record/Request/Page [Exit After Last, Next]
If IsEmpty(FieldA),
Set Field (FieldA, ExportDB::FieldA)
EndIf
If IsEmpty(FieldB),
Set Field (FieldB, ExportDB::FieldB)
EndIf
.... etc.
End Loop


This would be a pain to set up, with fifty fields to check, and it would
probably take quite a while to run. But it's better than data entry! At
least you can set up one of the If statements, and duplicate the whole
thing, then change the field name. After you have made fifty If
statements for the first record, you can duplicate all fifty at once to
place inside your Loop.

Your sentence, "if not empty it would prompt me to look at it in detail."
made me think that perhaps you want to flag these records for later
study. Make a basic number field, call it CheckTheseFlag. Add an else
condition to each of the If statements above:


If IsEmpty(FieldA),
Set Field (FieldA, ExportDB::FieldA)
else
If FieldA <> ExportDB::FieldA,
Set Field (CheckTheseFlag, "1")
EndIf
EndIf

Now your script will fill all empty fields from the dupe set. Records
where identical data exists in both source & target fields will be left
alone. Records where data exists in both, but the data is different will
have the flag field set, but the data will otherwise be left alone.

Let it run, either go get a cup of coffee, or let it run overnight
(depending on the size of the database). When it's done, you should only
need to worry about flagged records. Do a Find on CheckTheseFlag and pull
up the group where differeing data existed in a field in both the primary
record and its dupe. If this group is large, you have a problem and you
have to decide what to do about the conflicting data. It it is relatively
small, a dozen or so, you should be able to manually fix them.

An easy way to do this is to make an All Fields layout in your primary
database. Push all the fields over to the left. On the right insert the
matching field from across the relationship to your dupe file. Do this
for every field. Now, in Browse Mode, you can see what data is in every
field of a record and its dupe at a glance. Go through your flagged dozen
and do what you need to do.

If you ended up making a third file, where you checked for dupes in your
dupe file (dupes that were originally three records), and exported &
deleted these dupes into the third file, then you can run the whole
process again between your primary database and the third file.

If you have to do this, it will not be pretty, having to redo those fifty
If statements. You might experiment by editing the relationship to point
to the third file. The original script should now point to fields in the
third file instead of the second (Scriptmaker updates such reference
changes), but it might not. I just tested it and it seemed to work, but
I'd be careful doing this.

Steve Brown


Reply With Quote
Reply




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Powered by vBulletin Version 3.5.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.