Click here to view this article as a PDF.
Author: Joel Kristenson
Last Updated: 5/8/2014
Overview
This article will show you the different ways you can find possible duplicate records (Voters/Donors/Contacts) in your database. Even the most conscientious organizations will get duplicate records from time to time due to uncontrollable factors.
Outline
#1 Find Possible Duplicates
#2 Assign Possible Duplicates to an Attribute En Masse
#3 Related Resources
#1 Find Possible Duplicates
Navigate to your contact list (Voters/Donors/Contacts). In this example I used a nonprofit database where we renamed Donors to People and Orgs. Click here to learn how to customize labels in your database.
Load up a list of records you want to search through for duplicates. If your database has a large amount of records i.e. 100k – 1Milliion+ you may want to run this process in different chunks. In my example I loaded all 10,496 records.
Click [File] > Utilities > Find Possible Duplicates
The ”tie breaker” columns include:
- Middle Name
- Birthdates
- SOS (Secretary of State Number)
- Address
- Last Name
- First Name
Example: If you select only name and address, people with different middle names will be listed as a possible duplicates.
You can change around the options to fit your needs – every time you run the process you will need to start with a new contact list.
If you lower the numbers for characters to include in the address and name you will likely find more duplicates but they could be false positives. For instance it could be a Sr. and Jr. living at the same address with similar names.
In my example I lowered the 3 thresholds to 8, click [OK] when you’re ready to run the process.
Trail Blazer will display a progress bar, if you’re working with millions of records it will of course take some time and you may want to launch a separate session of TBZ.
Once complete it will prompt you with the results and a recommendation to assign this list an attribute of “Possible Duplicates”. Click here to learn how to set attributes en masse.
In my example I had 36 possible duplicates from my entire list of about 10,000 records.