Why ZoomInfo Uses RPC for Search MicroservicesBy Justin Sweeney, February 12, 2020
One of the big decisions when ZoomInfo chose to split its search service away from the main monolithic application and turn it into a microservice was the communication method to use between the microservice and the code calling it. RESTful APIs are common, but there are many other options out there. After careful examination of the options and the ZoomInfo use cases, ZoomInfo decided to use RPC (specifically, gRPC).
What is RPC
RPC, or remote procedure calls, have been around for decades. Essentially, they provide a mechanism for running subroutines, methods, procedures, functions, or other bits of code on an external system without knowing any information about that system or its contents.
The traditional RPC mechanism uses stub functions (methods, procedures, etc) on both the client application (the application making the remote procedure call) and the server application (the application processing the remote procedure call) that look exactly like code being executed locally. The client stub marshalls the pertinent information and asks the local system to send it in a message to the server system. The server system passes the incoming message to the server stub which unpacks the message and processes it, then packs up the results to reverse the process back to the client stub.
For more information on RPC, see this article.
What is gRPC
Google released gRPC as an open source project in 2015. It is an updated take on RPC designed primarily for use with microservices housed in cloud architectures. gRPC typically uses streaming rather than a traditional request-response model, permitting parallel processing of large amounts of data or allowing for time series processing of ongoing data links.
In addition to direct client-server communication found in traditional RPC frameworks, gRPC includes modern features like authentication, load balancing, auto-scaling, fast stub generation, and more.
For more information about gRPC, visit its product page.
What is Protobuf
Protobuf, or protocol buffers, is the mechanism Google designed to serialize data in gRPC. Protobuf uses its own interface definition language to define the data to transport over gRPC then provides code generators to translate protobuf definitions into code in several different common client and server languages used to either marshall data into outgoing messages or unpack it from incoming messages. It was designed to be smaller and faster than XML (and is generally also smaller than JSON). The terms protocol buffer and protobuf are used to describe both the entire serialization mechanism and the resulting data structures sent via gRPC.
For more information about protobuf, visit its Google Developer Network page.
What Else Did ZoomInfo Consider
ZoomInfo also considered using RESTful APIs with its search microservices. REST APIs are extremely common and well understood, are easily supported in most relevant tooling and technologies (programming languages, frameworks, IDEs, etc.), and have a large ecosystem and support base around them.
Benefits of gRPC
ZoomInfo decided to use the gRPC flavor of RPC for three reasons:
- Streaming capabilities
- Secondary support for REST APIs included
Remote procedure calls are designed to improve performance by offloading processing to external machines. Performance is also improved by other aspects of gRPC such as support for streaming (see below) and use of HTTP/2 and its binary framing layer that supports streaming natively.
The exact performance benefits of using gRPC instead of options like REST APIs, SOAP, or other online data exchange mechanisms vary greatly depending on the specifics of the data being exchanged and the call definitions.
Some of the benefit comes from using protobuf as the message format instead of JSON or XML. It was designed to be smaller and more compact than XML and is also more compact than JSON for many data structures. Having a smaller number of bytes representing the same data will automatically make transferring that data faster if all other things are equal. In the case of protocol buffers, packing and unpacking the data is also optimized for many types of data, improving performance further.
ZoomInfo has not explicitly tested the performance benefits from gRPC or protocol buffers for its specific data structures – that would require building independent REST APIs that handle the same exact data transfers – but other companies have performed such tests on their own data or on generated sample data. While some tests found little or no performance benefit from using protobuf over JSON, most published studies do show a performance benefit from protobuf over JSON or over JSON and XML. In particular, Auth0 published a direct comparison of the performance of protobuf vs JSON in REST APIs and found that protobuf was 600% faster for their simple test data structures. This article from Criteo Labs compares the performance of protobuf, Thrift, JSON, XML, and Apache Avro in C# and found protocol buffers were faster than either JSON or XML for both small and large data structures (Thrift was the absolute winner for small objects and Avro for large, but protobuf was a close second in both cases).
Another way gRPC provides such great performance is by supporting streaming data rather than relying solely on synchronous request-response calls. Requests for large amounts of data can be returned to the requesting client in many parallel streams of data, essentially creating a highly threaded model that is much faster than a single data connection. This greatly increases the rate at which the response can be processed, improving the performance of such requests.
In addition, gRPC supports bidirectional streaming, meaning that a client can send multiple request streams at the same time as well as receiving multiple response streams for each request. This, too, increases the amount of simultaneous data processing and thus improving performance of systems with a large number of requests. ZoomInfo does not currently use bidirectional streaming but may do so in the future if we identify use cases that might benefit from it.
Information about the supported streaming options can be found on the gRPC Concepts page here.
Secondary support for REST APIs
Another benefit of gRPC is support for REST API calls with JSON payloads as a secondary data transfer mechanism. This allows teams calling into search to use data formats and endpoints that are familiar to them and, in some cases, compatible with the rest of their codebase. For example, the ZoomInfo Enterprise API uses JSON as its primary data format; those APIs are heavy users of search functionality. Being able to redirect those JSON payloads directly to the search microservice via simple REST API calls made it easy to support their use without requiring expensive transformations between JSON and protobuf or requiring experts in REST APIs to use much less familiar code mechanisms to support search requests. While not necessarily a major deciding factor for the search team, this secondary support certainly made it easier to select gRPC for use with the search microservice and was very much appreciated by its consumers.
Drawbacks of gRPC
There were some drawback to choosing gRPC over REST APIs:
- A lack of secondary support tools (and less functionality in those that do exist)
- The surrounding infrastructure is more complex
REST APIs come with an entire ecosystem of knowledge, support tools, and infrastructure assistance. A large swath of people know how to use, build, test, release, and document them. API documentation can be produced more easily with tools like Swagger, apidoc, RAML, Postman, or a myriad of other options. Live API consoles are supported using Postman, Dev HTTP Client (DHC), Talend API Tester, Mulesoft APIkit Console, or many others. Questions about how to do something – anything – or a problem you run into during the data design or endpoint development stages will likely be quickly answered on sites like StackOverflow or on the online help forums for specific tools and technologies.
Similar tools exist for gRPC, but there are many less options and they tend to have less functionality, be harder to use, and have a much smaller user base capable of providing informal support when needed.
Not only are tools for managing the implementation of REST APIs plentiful, the process is simpler than the same process for gRPC. Supporting streaming is inherently more complex than supporting direct single request-response data transfer. Further, most REST APIs use HTTP/1.1, the standard transport mechanism in wide use since 1997. Requests made with gRPC use HTTP/2 to leverage its binary framing layer and native streaming support. This makes some of the tooling and support applications more complex as many options are designed for use with HTTP/1.1. For example, ZoomInfo’s initial issues with load balancing search microservices (outlined in this blog post) were caused at least in part by the differences between HTTP/1.1 and HTTP/2.
ZoomInfo decided that the benefits of using gRPC far outweighed the drawbacks and, for the most part, has been happy with that decision. The lack of secondary support tools proved to be a larger issue than originally expected and definitely slowed down the development process a bit, but it was balanced out by the secondary support for REST APIs that allowed the ZoomInfo Enterprise API and other internal search clients to make calls in the format they’re used to using with the JSON data structures they already have in their own code. If you need to transfer large amounts of bulk data as quickly as possible, gRPC and its support for parallel data streaming is definitely a good choice.