This month I'll tackle the dreaded DUPLICATES. There's little point in storing records if you have two for the same person with different addresses in them, one from a old address they moved out of a year ago. FileMaker has a good built-in duplicate search facility; you just type "!" in a field in Find mode. But it has some limits.
For one thing, you have to decide just what is a duplicate? It's quite possible that several people could have the same name; it's also possible that people with different names could be living at the same address, related or unrelated. Once you've decided, then you should create a concatenated (put together) field with the criteria. You can also use some of the text functions to keep the size down.
Duplicate Field Calculation
Here's an example Dupe field (calculation, text) for a contact-type file:
Dupe = Left (First, 3) & Left (Last, 3) & Right (Main Phone, 4)
Search with a "!" in that field and you should only get one person. Remember, no more than 20 characters without a space in an ID field, and less than 60 total (indexing limits).
One problem with the built-in Find is that it finds all the records that are the same, including the original. Then you have to look at them to decide what to do with them. Another limit of Finds is that you have to run the search. It would be better if you could be notified automatically while trying to create a new record. To do that you'll have to use a self-relationship.
First, you must have an auto-entered Serial ID for each record (always); I use a text field, with a letter then zeros in front (ex., A00001).
Self-Relationship
Create the relationship (I'll call it "Self_Dupe") in the same file with the Dupe field on both sides; Dupe= ::Dupe (same Dupe field from earlier).
Create a new calculated field, type number, DupeMark= If (SerialID = (Self_Dupe::SerialID), "", 1)
The serial ID will only equal itself for the first record of the relationship. If you enter a new record and the same data is in the Dupe field, it will get marked with the 1.
Duplicate Warning
Now create your warning. It would be a calculated text field, say "DupeWarn."
If (DupeMark = 1, "DUPLICATE", "")
Put it on the data entry layout where there's some space. Set it to not allow entry and take it out of the tab order. Make the text red, but with transparent background and no line.
It normally won't show, but enter a duplicate and it will jump out. Since it's based on a relationship it's unstored, so it doesn't even require exiting a field or record to trigger it. (Which is important. It doesn't do much good to have a warning that only shows up after the user leaves the layout. You can't count on them clicking outside of any fields before exiting.)
When You Want Duplicate Records
Using the same Duplicate relationship you can do a number of things. Often times you want duplicate records for a person or item in a database, and you need to track them. You may have several invoices for a person, or whatever; the principle of self-relation is still the same. Use a field that is always the same in the group of records and create a self-relationship on that field. Usually it's a text ID field.
Aggregate Functions with Self-Relationships
If you want to find out how many records, just count it with a calculated number field:
Count Dupe = Count (Self_Dupe::Serial ID)
It will instantly update. You can use similar self-relationships with an identifying key to total any number fields in its set of records. For example, let's say you had a number field, Charges.
Create a calculated field, Person's Charges= Sum (Self_Dupe::Charges)
You could get the same total with summary functions, but this one is more reliable, as it doesn't depend on sort order. You can even use it in other calculations, such as, Balance Due=
Sum (Self_Dupe::Charges) - Sum (Self_Dupe::Payments)
They are unstored calculations, therefore slow, but so is sorting and going into Preview mode (another way to get this total).
Marking First or Last Entries
If you want to mark which is the first entry in the group, use this:
If (SerialID = Self_Dupe::SerialID, 1, "")
It's just the opposite of the duplicate calculation.
To find the latest entry you can use the Maximum function, but you have to do a little compensation for the fact that the Serial ID is a text field.
If (TextToNum(SerialID) = Max(Self_Dupe::SerialID), 1, "")
This will also find all non-duplicates.
To find all duplicates that are not the latest entry, exchange the 1 and "";
If(TextToNum(SerialID) = Max(Self_Dupe::SerialID),"", 1)
or use "<>" (does not equal);
If(TextToNum(SerialID) <> Max(Self_Dupe::SerialID), 1, "")
If you want to use other criteria for the first entry other than when it was entered, then you can do that also, by using the minimum or maximum aggregate functions.
For example, the record with the biggest payment for a person: If (Payment = Max(Self_Dupe::Payment, 1, "")
Last Function
Actually, in speed tests it has been found that the Max function is kind of slow. There is a better way. We know that Go To Related Record [Show, "Relationship"] will go to all the related records, and that it will go to the first record in the relationship (which is the first in the Sort order, if the relationship is sorted, FM4 and up).
But there is also a function to go to the last in the relationship. It's hidden in the Function list, under Repeating Fields. It is the Last() function. If you put a related field inside the parenthesis, it will go to the last record of the relationship.
So, If (TextToNum(SerialID) = Last(Self_Dupe::SerialID), 1, "") will find the last entered record, the same as
If (TextToNum(SerialID) = Max(Self_Dupe::SerialID), 1, ""), and is a hair faster (but definitely cooler).
Deleting Duplicates
You can now reliably find a set of only the duplicates of records, any number of them, excluding either the first or last entries. You can then add the single script step:
Delete All Records
That should do it.
Make these marking fields visible on a list view layout. Experiment on test records until you are sure that they are doing what you want before committing to an automated Delete All script.
Don't worry, once you've got them down they will work well, with probably less error than if you did it by hand.
Send comments, questions, toys to Fenton Jones