The Differences Between Search and Match
By Justin Sweeney, January 14, 2020ZoomInfo supports two mechanisms for retrieving bulk data based on queries: search and match. Both of these use Apache Solr to retrieve records based on a set of supplied input data but they work in different ways to provide different results. This blog post will delve into each method, how they work, when to use each, and compare the types of results they return.
How Search Works
Search takes a set of input criteria for a person, a company, a scoop, or an intent and uses it to return possible matching records based on all of the inputs; only records that include 100% of the input criteria are considered for the response. In the case of names, search assigns a limited number of common nicknames to certain names, returning them as valid results ranked below the results that use the exact name provided in the search. Partial word matches are also returned but ranked lower; these partial matches must be at the start of the word though unless the search term was explicitly added as a searchable term for the result. For example, if searching for companies named Smith, companies with the name Smith and Smithers are returned but companies with the name Westsmith or Blacksmiths are not. GlaxoSmithKline will be returned, but only because the term smith was explicitly added to its profile as a searchable term.
If the search term for a specific field contains multiple words, a record needs to match all words (in any order) to be considered an exact match. However, if searching on names, the term could match a middle name if one exists in the record and this will generally not be displayed. So, for example, a search on Rob Frank could return someone named Frank Jeffries if his middle name is Robinson or some other term that matches rob.
All records that match a search are available for the response and the order of multiple word terms is considered only for ranking purposes not for matching purposes; if searching for someone named James Smith, records with the names James Smith, James F. Smith, Jameson Smith, Smith Jameson, Jamie Smith, and Jimmy Harris are all valid matches but records with the names John Smythe, Harry Smitty, and Greg Jamison are not.
The search service is used by various features of the combined DiscoverOrg and ZoomInfo platform directly and is also available via the Enterprise API as well as directly to users in the combined platform web application.
Search requests do not limit the number of possible results; paging is used if necessary for performance or parsing in both the API and the application. The web application presents 25 records per page; the API uses 25 records per page by default but allows users to specify up to 100 records per page. Records are returned either ordered by relevancy or by a sorting criteria (the available criteria are preset and different for each type of record). The initial results are always calculated with the default relevancy sorting but when a predefined sort option is selected the search is sent back to Solr to be reordered based on the chosen sort; sorting is actually performed by using a special boosting formula where the sort criteria overpowers all of the other criteria considered in the scoring process.
The default relevancy calculation is a modified version of the Solr default relevancy calculation. It boosts records that include data for certain fields such as company name and email address and also gives a slight preference to larger companies and people with higher positions within a company (if the CEO and a software engineer in a company share the same name and position is not specified as an input criteria, the CEO will be ranked ahead of the software engineer if all other things are equal).
When to Use Search
Search should be used to find exactly corresponding records when a set of information is already known about a person, a company, a scoop, or an intent. Only records that exactly use the data provided for each input field will be returned in the results (with the caveats above regarding nicknames and multiple word search terms), so the more information you know the more effectively searching will narrow down the possible candidates.
Generally searches return more than one possible option, especially when using common names or searching within large companies. Those results will be returned in order of likelihood (by default) and the top results are likely to provide accurate data corresponding to what the search was trying to find.
How Match Works
The match service takes a set of input criteria for either a person or a company and uses it to return possible matches based on the inputs. The match service is available to end users of the combined DiscoverOrg and ZoomInfo web application via the Enhance feature. It is also used by various features of the combined platform directly and is available via the Enterprise API.
The match service allows its ZoomInfo clients (the services and features of the application leveraging match) to determine where to set a threshold for matching and to specify how many records to return (if that many qualify as a match). In most cases, features set the return count to one, using the match service to retrieve the single best matching result.
Match always returns the requested number of response records regardless of how well each one meets the specified criteria; records do not need to match all of the input criteria but rather are returned in order of best to worst match found. Even if a very close match is found and all of the other potential matches are significantly less likely to be the desired person or company, the requested number of matches is always returned unless there are no matches or not enough matches available.
The threshold value is a balance between quantity and quality; a low threshold value will allow more records to potentially match a set of input criteria but some of them may not be very close matches while a high threshold restricts the number of possible matching records but ensures that those that do match are all quality, close matches to the input criteria. A default threshold that provides a good balance is often used by other features leveraging match and is always used in API requests. Even so, it is common for a match request to result in hundreds of potential matches.
A scoring process is performed on each potential candidate for matching. It provides a weighted score for each attribute and adds them together to determine a combined score that determines how close the record appears to match the input criteria as a whole. This score is used to order the records in the response, cutting off the response after the requested number of records have been included.
Match also supports partial term matching, looking for nicknames for certain common contact names, and other features outlined under search. However, because of the limited results returned in the response to a match request, records that use this looser type of matching are rarely seen in the actual match results.
When to Use Match
Although match typically only returns one (or a handful) of records that are considered the best possible results for a request, it can be used to cast a wide net looking for possible records corresponding to a limited amount of initial data about a person or a company before trimming that list down to just the most likely fit or fits. It can also be used for features that want to automate a process and remove the record selection option from the user by just selecting the single best fit and using it.
Search vs Match: A Direct Comparison
Whether search or match is a better option for a specific request for matching records depends on how the results will be used and the type of input information you have available. Your environment also matters; if you are an end user of the combined platform you’ll generally use search for most data retrieval requests (Enhance requests do use match). Search is also the only option for folks looking for scoops or intents.
If you’re trying to find a specific person or company that you know a limited amount of exact data about (for instance, a company that includes the word Performance in its name that’s located in California) search is almost certainly going to provide better results since it will only return companies that meet all of those constraints but will allow you to look through all of the exact matches to decide which one is the one you want. However, if you have a lot of data available as input criteria or that data is fairly unique such that there is likely only one corresponding record in the data set, match is a better option (if available).
If you’re not sure of all of the information you have (for instance, you want to identify someone you talked to last week named Jon or John who may be also be listed as Jonathan who is VP of something or other at a semiconductor company), a search request on people whose names start with Jo working at semiconductor companies with VP positions should return a list of results that includes your desired contact. You can sort the results by name and jump to the records with first names of Jon, John, or Jonathan to look for the specific person you want. Alternately, you can just search for Jon or John or Jonathan since ZoomInfo searches return all three as search matches for any of the three names.
Note: Match will still consider people named Jon or John when you search for Jonathan along with whatever other data you have for the person you want to find, but it is unlikely they will be returned as they will be ranked below all people named Jonathan and almost certainly fall below the required number of results returned.
Match is a good fit for a feature doing bulk lookups that wants to return a single result as a starting point for users. For example, if an end user uploads a list of people with an assortment of names, email addresses, titles, and company names and wants to find the matching ZoomInfo records, match can pick the single best match for the set of data available for each contact and present it to the user for review.
Final Thoughts on Search vs Match
Both match and search use Apache Solr to find records in the ZoomInfo person, company, scoop, or intents data stores that meet specified input criteria. Search looks for exact matches and returns all such matches found. Match casts a wider net and returns a set number of records, some of which may only meet some of the specified criteria. Both of them are used by other ZoomInfo services and features and both are also available via ZoomInfo APIs. Choose the one that’s better suited for your specific use case and known input data to find the records you need.