• All Articles

Real Time Event Processing at Massive Scale

author

By Patricio Petruzzi

September 9, 2022

image

At EXADS, real time events are an important data source for our SaaS platform. Every impression, click, action, bid, etc. has to be tracked and stored appropriately. Building a system that is capable of handling tens of billions of such events daily is not a trivial task. 

Events generated by our systems have to be processed and transported to different databases to help us with the following main use cases:

  • Display statistics to clients about users, sites, campaigns and any other dimension of the ads that have been shown. This includes revenue generated for publishers and costs incurred for advertisers.
  • Keep track of budgets allocated to campaigns and stop the campaigns if the budget ends. This is critical and complex because concurrency among servers and data centres is not simple.
  • Monitoring and Alert Systems to identify issues. Some technical issues can be small and hidden in vast volumes of data. Furthermore, fraud attempts and misuse on the platform can be also detected. 
  • Data Analytics. We perform regular analytics tasks with the goal of finding new business opportunities, developing new features and improvements to the platform.

Introducing our new Events Processing Pipeline

A few years ago we started to design and build this new system to replace an old monolithic application. The old system was updating our databases using rsync to move logs across the different servers.

The new solution had to be built with the following goals:

  • Source independent: Data could be ingested from any platform. We are migrating from a monolithic application to microservices and we should be able to use both in the process.
  • Destination independent: We do store data in multiple destinations for different purposes. We must be able to use any of our databases as storage.
  • Scalable and Fault Tolerant: We have been growing steadily since our inception. Scalability is a key factor for all of our systems. Uptime is very important too, we must be able to keep it working 24/7 without much intervention.
  • Easy to operate: Our Infrastructure Team already deals with hundreds of servers and many systems. This new solution should simplify their work, not add more.

 

To achieve those goals we evaluated the possibility of using a cloud based solution; but since we already own and manage our infrastructure adding this new system to the current setup was the most cost effective option.

The next step was to find an open source event streaming platform that could suit our needs. Apache Kafka has evolved considerably over the last few years, proving itself to be a reliable solution for heterogeneous data streaming while providing outstanding throughput and performance; it met our requirements. And at last, a robust ecosystem, especially the connectors and client libraries, future proofs the system, facilitating the ongoing migration to a microservices architecture.

Event Producing

Our web application currently provides the right ad for each request in less than 10 milliseconds, and once such an ad is selected the logs are created in JSON format. Adding any other library and external communication could negatively impact this impressive accomplishment.

To avoid any impact on our web servers we decided to keep the JSON output and create a microservice running locally which collects all those messages and sends them to Kafka. This solution eliminates the need to get reception acknowledgment from Kafka before the next message. Instead, the response is sent back immediately and the message is added to the buffer. This feature greatly reduces the response time, and improves the batching and system’s overall availability. At last, messages are sent using Avro binary encoding, reducing their size.

Event Format

Since the start, we specified a unique format for events generated and processed. This solution has greatly simplified the complexity in consuming events and aligned consumers and producers.

Any kind of event has a standard envelope and payload. The envelope contains the context of the event (time, server, the type of event, etc..). The payload is the event data as we had in our previous monolithic system. This has enabled us to quickly implement the new system without having to implement significant changes in the producer or the storage of those events.

Kafka Cluster

Kafka works in conjunction with ZooKeeper to form a complete Kafka Cluster ⁠— with ZooKeeper providing the distributed clustering services, and Kafka handling the actual data streams and connectivity to clients.

Our Infrastructure Team was already used to manage ZooKeeper clusters which facilitated the deployment of Kafka. 

As usual with our systems, we created a production and a development environment. The creation of both clusters was straightforward and scalable as needed. 

Moreover, AKHQ was installed to simplify Kafka management. AKHQ provides a Kafka Graphical User Interface to manage topics, topics data, consumers group, schema registry, connect among other things.

Data Warehouse

At the beginning of this article we discussed how storing all those events were critical for different needs of the company. Furthermore, we described how different solutions are used for those different needs. 

We decided to create a specific ingestor for each of the databases we use, facilitating the ingestion of only the specifics of the event that are relevant for each case.

Our Admin Panel relies on ClickHouse to provide real time data statistics to our clients. A new tool, Clickhouse Data Consumer, was developed to consume events from Kafka (and keep backward compatibility to read events from a JSON file) and ingest them into the various ClickHouse clusters (development and production).

Keeping track of balances is another key functionality. Another consumer was created, Balances Data Consumer, which updates users and campaigns balances stored in MariaDB. This consumer only collects events that affect balances. All those different events are then grouped together and the balances for each campaign are updated regularly.

Finally, one last consumer was developed, HDFS Data Consumer, to upload the events to Hadoop. This cluster is used for data analytics and requires to store all the events we generate. 

Conclusions and Future Work

It was paramount to build an event processing system that could fulfil our current and future needs, paying special attention to its efficiency and scalability. Choosing the right building blocks is essential in this kind of task and Apache Kafka has proven to be an excellent solution for this enterprise.

There is still a lot we want to achieve and we are already discussing our next steps. We are analysing how we could split the events into different messages and using a schema registry to store the different versions of the messages. This will enable us to be more flexible when it comes to new releases and facilitate new deployments.

 

Share this article on