Why We Moved to Apache Pulsar

At Narvar we interact with millions of consumers every day through the retailers we serve. Consumers rely on us for timely reminders and updates on their accounts, on the state of their orders and packages, and on the status of their returns. On the other hand, our business customers — hundreds of the biggest retailers and brands in the world — rely on us for flexibility to configure Narvar’s product for the needs of their consumers. 

Narvar’s platform helps retailers and brands by processing data and events to ensure timely and accurate communication with their customers. To handle that, we built our platform using a variety of messaging and processing technologies over time, from Kafka to Amazon SQS, Kinesis Streams to Kinesis Firehose, RabbitMQ to AWS Lambda. Many of these systems were native to AWS and easy to adopt and get up and running. These systems worked well and they served our needs…for a while.

Our Challenges

As the company grew and new business use cases began to emerge, we found these systems didn’t quite work as well as needed to address the new and expanded requirements of those use cases. As we grew our customer base it became clear it was not easy to process events in configurable ways that scaled accordingly. As we saw more applications needing to consume events, it required spinning up new services that entailed maintenance overhead or using expensive Amazon Lambdas. For other business cases that emerged, we saw that Kinesis Firehose did not provide the flexibility needed to output data in the form and structure that was required. As we saw a growing number of use cases that required strong in-order guarantees and topic scalability, it became clear that many of the solutions we were using were not suited for those new challenges.

In addition to these challenges from emerging business use cases, we also saw challenges of scale. As our traffic grew, it became apparent that the growing amount of DevOps and developer support required to maintain and scale these systems was unsustainable. Many of them were not containerized, making infrastructure configuration and management burdensome, and required frequent manual intervention. 

Systems like Kafka — while reliable, popular and open source — had significant maintenance overhead as we scaled. For example, increasing throughput required increasing partitions, tuning consumers, and required a large amount of manual intervention by developers and DevOps. At the same time, solutions like Kinesis Streams and Kinesis Firehose were not cloud-agnostic, making it hard to decouple the choice of cloud solutions from functionality and making it difficult to leverage technologies in other clouds, and to support customers who needed to run in other clouds.

Faced with the challenges of these emerging business cases and the hassles that we were encountering as we scaled, we decided to move to Apache Pulsar.

Why Apache Pulsar

Like Kafka, Pulsar was reliable, cloud-agnostic and open-sourced. Unlike Kafka, Pulsar entailed very little maintenance overhead and scaled with minimal manual intervention. Pulsar was containerized and built on Kubernetes (ie. is ‘cloud native’) from the outset, and while it had multiple complex components within, it was easy to spin up and scale with the help of the Streamlio team. Version upgrades were easy to apply to a given cluster without much incurred downtime. Streamlio provided us with system monitoring dashboards (based on Grafana) making it easy to monitor out of the box.

In addition to being scalable and much more maintainable, Pulsar came with differentiating features that made several of our business use cases possible. For example, we gained access to Pulsar Functions, which helped us scale up the number of things we did with the consumed events while eliminating the need for expensive Lambda functions or standing up additional services. Stronger support for in-order guarantees within a topic enabled several more business use cases.

Finally, Pulsar provided a broad set of features and functionalities all in one system, thus eliminating the need for many of the point solutions we had been using. It eliminated the need for multiple messaging technologies — Kafka for pub/sub and RabbitMQ and Amazon SQS for queueing. Also, we no longer required Kinesis Streams for processing nor Kinesis Firehose to load streaming data into data stores. Moving from this list of technologies to Pulsar helped us reduce cost and complexity as well as make it easier to support other cloud infrastructure providers. Narvar has been using Pulsar live in production for close to a year now and it has proven to be a very reliable workhorse.

 

Contributors to this article:

Gnana Prakash, Sherwin Pinto, Rajan Vijaykumar, Tommy Meusburger, Corey Hall

Come see Anand Madhavan chat about Apache Pulsar with Karthik Ramasamy of Streamlio at the O’Reilly Strata conference in New York on September 26.