Using Solr Plugins to Join Data

By Joel Chou, October 17, 2019
Using Solr Plugins to Join Data

ZoomInfo uses a series of Apache Solr instances to store searchable data about people, companies, and several other things. These are independent instances that do not, by default, know anything about each other or share any data. However, there are times when the logical search crosses these boundaries or requires some external data in combination with the information stored in Solr. Enter Solr plugins.

What Are Solr Plugins?

While Solr is feature rich, Apache knows that it can’t be all things to all people. To accommodate this, Solr supports a plugin development model that allows users to extend existing Solr functions to customize their behavior and better suit the specific needs of that user.

Solr Plugins are JAR files that extend specific Java classes identified as extensible in the Solr class structure. They can be used to define extra logic performed for each search, modify the search result output format, control how user data gets cached, control how data is indexed, define custom data types, and execute specific code whenever particular events occur among other things.

For more information, see the Apache Solr Plugin Confluence page.

The ZoomInfo Solr Plugins for Joining Data

ZoomInfo has developed a range of Solr Plugins that add different types of functionality to our product. The plugins discussed in this article extend our search capability by joining data across multiple Solr instances or between a Solr instance and an external database. This allows users to search across different types of data and also to search against their own or their organization’s history.

ZoomInfo currently supports the following plugins to do this:

  • zoomjoin
  • companyjoin
  • solr2solrjoin
  • techjoin
  • orgjoin
  • netnewjoin

The zoomjoin Plugin

The zoomjoin plugin was the first ZoomInfo Solr plugin. There are two versions – one for contacts and one for companies – that allow search results to be filtered against historical user records that indicate which records the user has previously accessed. This supports two types of searches:

  1. Return all contact or company data I’ve seen before matching the specified criteria (in other words, what is my history with this search)
  2. Return all contact or company data I’ve never seen before matching the specified criteria (in other words, return new search results that I haven’t seen in the past)

This plugin joins either the company or the person instance of Solr with an Apache HBase database storing the user history information. Matching search results from the Solr instance are joined to the HBase instance storing user history to filter out either those results that have been presented to the user before or those that haven’t.

The companyjoin Plugin

The companyjoin plugin allows users to search for people based on company data. For instance, you can find people who work for companies within a certain revenue range with their headquarters in a particular state or region.

This plugin joins person instances of Solr with an HBase database storing company IDs. An initial search is performed on the company instance of Solr and the relevant company IDs are stored in HBase. Those IDs are joined with the person instance of Solr to return a list of users who work at those companies.

The solr2solrjoin Plugin

The solr2solrjoin plugin allows users to search across criteria stored in two different types of Solr instances. For example, you can search for companies of a certain size with specific intents criteria or people with a specific title associated with a particular kind of scoop.

This plugin joins any two types of ZoomInfo Solr instances, first finding matching items in one instance then filtering those results by items that also match the second.

The techjoin Plugin

The techjoin plugin allows users to search for companies that started or stopped using a specific technology within a set date range. For instance, you can find companies in New England that stopped using PeopleSoft in Q1 2019.

This plugin joins the company instance of Solr with a Google Cloud BigTable database storing information about what technologies companies use now or in the past and when they started or stopped using them. The search is performed on the data from BigTable to find companies that match the search criteria then joined with the company instance of Solr to return other information about the company in the response.

The orgjoin Plugin

The orgjoin plugin is similar to the zoomjoin plugin but it looks at organizational history rather than a specific user’s individual history. There are two versions – one for contacts and one for companies – that allow search results to be filtered against historical organization records that indicate which records any user from the organization has previously accessed. This supports two types of searches:

  1. Return all contact or company data anyone from my organization has seen before matching the specified criteria (in other words, what is my organization’s history with this search)
  2. Return all contact or company data that has never seen before by anyone from my organization matching the specified criteria (in other words, return new search results that my organization hasn’t seen in the past)

This plugin joins either the company or the person instance of Solr with an HBase database storing the organization history information. A preliminary search is done on the Solr instance then joined to the HBase instance storing organizational history to filter out either those results that have been presented to the user before or those that haven’t.

The netnewjoin Plugin

The netnewjoin plugin supports recurring searches, returning new results daily from saved searches.

This plugin joins company or person Solr instances with an HBase database storing saved searches. Saved searches are taken from the HBase database and used each day to look for results added to the relevant Solr instance in the 24 hours since the last time the searches were run. The results of these searches are fed back to the users who created each search.

Final Thoughts on Joining Data using Solr Plugins

Using Solr plugins allows ZoomInfo to greatly expand its search capabilities while maintaining clean, focused data stores and segregating data into its own silos for faster access and easier maintenance.

Related Content