deleteCache Service: Removing Cache Keys in Solr Services

By ZoomInfo Engineering, Helen Liu, October 15, 2024

Background

Currently, ZoomInfo’s search service works heavily with Apache Solr, an open-source search platform, for its real time search capabilities and ease of use and adaptability. To add more flexibility and customization opportunities, Apache Solr allows for the addition of plugins to its application. Solr plugins are JAR files that act as useful tools, expanding the Solr’s abilities and utilities by implementing custom code into Solr and making it possible to build unique applications on top of its search engine. 

What is the Problem?

However, there is not only one sole data source that ZoomInfo uses. ZoomInfo Join Query Plugins brings data from other data sources into Solr, taking advantage of Solr’s features for data and simultaneously allowing a variety of sources for other applicable data requirements to be used in addition to Solr. For example, the deletion operation implementation is built into these plugins, responsible for deletion of certain data and keys from various sources like cache databases, among other data sources. The deletion process is important to ensure that data is deleted in real time to avoid stale data and support synchronicity among the data sources.

The Join Query Plugins should not be chalked up to just plugins, but more so a complex system designed for a wide range of functions. They are extremely powerful in making Solr work for ZoomInfo’s needs. Due to their versatile nature and wide spread use across ZoomInfo, Join Query Plugins are also very resource dense and require Solr to handle a large amount of search service processing. 

What is the Solution?

To reduce the heavy workload on Solr and its plugins, the project detailed in this article offloads some of the delete cache processing, which was previewed as an example to plugin functions in the last section, to a separate Solr micro service. The Delete Cache Service will build off of the existing Missing Ids Service, which first established Solr Services. By moving the handling of key deletion from caches to a separate Solr service and out of Join Query Plugins, the plugins no longer need to do the deletion itself and can instead simply send a web request to the services, reducing the amount of processing Solr has to handle for any given request with a delete operation.

What Does the Solution Look Like?

The diagram above is a high level view of the proposed solution’s major architectural components, excluding some in-depth implementation details. In order to get a better understanding, here is a break down of the shown components:

  • SOLR/Solr Plugins: Current plugins to customize functionality of Solr. Where key deletion is implemented and where it should be offloaded from.
  • solr-services: A separate Solr micro service that supports Solr plugins by offloading some of the processing work from plugins.
  • solr-services-client: Client library used by solr-plugins to make requests to solr-services. Avoids adding additional dependencies to plugins.
  • Service Request: Structure of request that is sent to solr-services from plugins. Includes keys to be deleted and other information to specify the database to delete from.
  • REST: Receives for solr-services the DeleteCache requests from Solr plugins/solr-services-client.
  • DeleteCacheController: REST Controller that receives delete cache requests. Has one instance of DeleteCacheService that initializes Redis database accesses once when solr-services are started. Verifies request and calls DeleteCacheService’s delete method with every request.
  • DeleteCacheService: In charge of initializing Redis database accesses based on specified configurations, and handles the actual deletion of keys from the correct Redis databases.
  • DeleteCacheService.delete(): Method in DeleteCacheService that uses Redis database accesses to delete specified keys.

Brief Process Overview

For a short summary of how the components work together to delete keys in Solr Services, the deletion process starts in SOLR/Solr plugins. When a query calls for a deletion, Solr plugins is responsible for deciding where the removals happen and creates the keys to delete, in addition to some deletion that is not yet moved to Solr services, if any. Using solr-services-client, the keys to be deleted will be sent as a web request to Solr Services, where DeleteCacheController receives the request and DeleteCacheService maps to the correct database and calls its deletion method with the request keys.

Tracking Performance 

In order to evaluate performance of this modification, metrics must be added to both Solr services for requests received and processed and Solr plugins for requests sent. For this project, metrics are measured and displayed with DataDog.

End Goal

What has been completed so far is only the beginning of this project. The end goal for the entirety of this project is to move a majority of the deletion process out of Solr plugins and into Solr Services. Ideally in the future, deleteCache service/Solr Services will no longer have to receive deletion requests from plugins and can be sent its requests directly from other search processing that occurs before Solr/Solr plugins. To put it another way, the finality of this project means Solr plugins would not have to act as the middleman for deletion processing to the deleteCache service, relieving Solr of the rest of deletion processing.

Summary

Overall, Solr and ZoomInfo Join Query Plugins are strong tools capable of many essential functions when it comes to real time search, but because of how useful they are, they also become burdened with a heavy workload and many dependencies. Moving a part of the deletion process, a critical part of making sure data is well maintained in a timely manner, out of Solr Plugins and into a separate Solr service, Solr Services, allows some of the work to be offloaded from Solr. There are many ways to move forward from the deleteCache service so that more of the entire deletion process can be migrated to Solr Services, but just by having Join Query Plugins send requests to and deleting cache keys in Solr Services helps reduce the large amount of processing the plugins must do.

Related Content