ZoomInfo decided to break its search service away from the main monolithic application and make it a microservice used by that application and, potentially, other ZoomInfo microservices or applications. See this blog post for a discussion of why and how we did this.
All told, we’re happy with that decision, but it definitely required some adjustments on both the technology and the process side. This article will provide an overview of the benefits and challenges ZoomInfo identified within the process area while adopting the search microservice.
The three biggest process benefits from microservices are:
- Independence of development, testing, and deployment
- Speed of development, testing, and deployment
- Automatic recovery from many problems
Because the microservice isolates the development of a single feature from the rest of the application, it operates on its own schedule with no regard for what anyone else is doing. The only external factors are the interface contract and any performance promises, essentially factors fed in as requirements to the independent SDLC of the microservice. Further, when stripping the functionality away from an existing monolith into a new microservice, there’s no requirement that the monolith be ready to interact with the microservice before the microservice is deployed. When the initial or subsequent new versions are deemed ready for testing they can just be tested. When they’re deemed ready for deployment they can just be deployed. If the consumers of the service lag behind, the microservice be sitting there waiting for them whenever they’re ready. ZoomInfo has perhaps seen the largest benefit from testing in isolation. In addition to feature isolation (more on this later), we’ve been able to test specific code changes in isolation. We automatically deploy test environments based on individual pull requests (PRs) and require that certain tests pass prior to merging a PR into the master branch. This means that most problems can be found and fixed early and without requiring time consuming research and additional testing to figure out the cause of the problem as is often the case without this level of isolation.
Having a microservice ready before a monolith consuming its service is quite likely as the entire SDLC tends to be faster for a microservice. There’s no need to coordinate between multiple teams or multiple features. You don’t have to worry about being forced to use languages or APIs or libraries just because they’re already being used by the wider application so a different option would be bloat or add unnecessary complexity. You don’t have to wait for everyone else to finish their coding before you can test, nor do you have to test whether feature A and feature B work together nicely or interfere with each other. Just exercise each option of the defined interface, make sure the deployment was successful, and ensure that performance promises are met (speed, load, latency, etc). That’s it, and once that’s been done the microservice can be deployed without regard to anything else. ZoomInfo sees this increased speed in each part of the SDLC, but it is most obvious when it comes to fixing bugs. In the past, even a rudimentary bug fix took days to work its way through the development -> test -> deploy process. Now it can take as little as an hour of work to go from first evaluation of a new bug to deploying a fix to the production environment.
Another advantage to using a microservice that lives in containers in the cloud is that many issues are fixed automatically. If an instance goes down a new one is brought up to replace it more-or-less instantly with little or no effect. If the load balancer can no longer communicate with a specific instance it removes it from the list of available instances and stops sending it traffic. It’s possible a very small number of requests are lost if the timing is just right, but generally this is a seamless process that no one even notices. In the old environment someone had to go in and manually bring up a new machine when a machine went down. This took time and resulted in lost activity and errors that got reported back to customers.
Adopting a microservice model means readjusting the development process at your company. This comes with its own challenges:
- People are resistant (or willing but stuck in old patterns)
- Teams who aren’t using microservices are still affected
- More communication is required
People are used to working one way and get comfortable with that way. They’ve always done it that way and it works so they don’t want to change it. This is natural and some percentage of people will instinctively react badly to change until they get used to the new ways. Others will be willing to give things a try but instinctively just do things the old way without thinking about it. Getting buy in from folks who need to adjust how they work is important to successfully implement the desired change and most folks will make the adjustment eventually. It may be that some folks are not suited to new processes being considered or put in place. If you’re working in a hybrid system there may be an option to let folks who are more resistant to change remain working in a legacy or monolith system but if not it may be necessary to make some hard decisions.
One thing that ZoomInfo hadn’t considered was the extent that teams not working on the new search microservice were still affected by the move to microservices even beyond the initial efforts to remove legacy search code from the monolith. Because search is a core feature of our ecosystem, many new features use search. Some of these work with the existing functionality but others require new search functionality to be added to the microservice. Being brought into the planning of these features as early as possible makes understanding their search requirements much easier and makes it more likely the search team has reserved time to work on them before they’re needed. In the straight monolith model the same folks may have implemented the entire new feature including the new search functionality so folks were not used to having to coordinate at this level.
Another topic of discussion requiring a fair bit of debate is exactly what constitutes legitimate search functionality versus what bits of new features involving search belong in the consuming applications (be they the existing monolith or new microservices). These conversations were unexpectedly complex without clear, obvious answers in all cases. These boundaries needed to be negotiated across teams.
One example of this was where search results should be filtered to provide only the data the requesting user is allowed to see. Unlike many systems where access control is by type of data, ZoomInfo limits access on a field level. Thus, two users making exactly the same search request should get the same number of results but the fields included within each result will vary depending on their respective access rights. There were proposals that winnowing out disallowed data should happen in the search results and proposals that it happen within the monolith’s API processing code (or in any other consuming service or application). In the end, after discussion of the merits of each approach, ZoomInfo decided the search service should always return all of the available data and the requesting service or application is responsible for applying the access rules and filtering out data that the user isn’t allowed to see.
In addition to being kept in the loop about related monolith features and deciding where the boundaries are between functionality that should be in the monolith (or another microservice) and functionality that should be incorporated into the search microservice, any time there’s new search functionality the interfaces and performance requirements related to them need to be ironed out. Remember, these amount to a contract between the service and its consumers. This requires a clear understanding of requirements, looking at other ways the same feature might be used in the future so it can be designed in a generic way that’s as future proof as possible, and an understanding of the existing interface so extensions are consistent (new users should not be able to tell which parts of the interface were in the original version and which ones were added in later versions; they should just see a consistently designed interface). The search team should also understand internally if there are any bottlenecks or additional impediments to a performant system so they can make realistic promises on this front. All of this takes additional communication and coordination.
When trying a new development model such as microservices, people tend to focus on the tools and technology. There’s no question that they are important. However, the people doing the work – and even sometimes those who aren’t – and the processes they use are equally important and need to be explicitly thought out and refined to have a successful experience. Overall ZoomInfo is pleased with our initial experiment with microservices but there were some bumpy moments to overcome.