How To Build A Stable And Resilient High-Load System: An Interview With Vadym Semeniuk From Illumin

Working with complex high-load systems requires more than just ensuring uptime—you need to maximize performance while minimizing latency and resource consumption. Vadym Semeniuk, a Senior Software Engineer at Illumin, is a seasoned expert in designing and optimizing large-scale platforms.

In this interview, Vadym shares how Illumin built a highly performant and fault-tolerant programmatic advertising platform, and what trends are shaping the future of high-load system development.

Tell us about your career journey and your experience with high-load systems—especially your current role at Illumin.

I’ve worked across a variety of industries and projects, but when it comes to high-load systems, Illumin really stands out. The scale we operate at is uncommon even by tech industry standards.

Illumin is an omnichannel programmatic advertising platform. Behind the scenes, it’s a deeply complex system. We analyze user reactions to personalized ad delivery, which means we process millions of requests per second—between 70 to 100 billion requests per day. That kind of volume demands extreme stability, speed, and resilience.

The platform must instantly interpret incoming user signals and return the most relevant ad. The end-to-end process—from page request to ad display—takes around 100 milliseconds. To maintain that performance, we invest heavily in resource efficiency and optimization.

How does your work directly impact the effectiveness of your clients’ advertising campaigns?

My primary focus is system stability and speed. If an ad fails to load quickly, users bounce—and our clients lose money. That can’t happen.

Another critical part of my role is building real-time tracking systems. These allow advertisers to monitor campaign performance: who’s engaging, who’s not, and which audience segments are converting best. This visibility enables them to optimize targeting and creativity on the fly, preventing budget waste.

Processing up to 100 billion requests daily is no small feat. What architectural decisions made this possible?

We don’t rely on public cloud providers—we run entirely on on-premise infrastructure. That gives us full control over hardware-level tuning and performance optimization—something you can’t easily achieve in the cloud.

Of course, that level of control requires a deep understanding of how code interacts with hardware. That’s where I come in: not just writing performant code, but seeing the full picture of how systems behave under real-world load.

We host our own servers across multiple leased data centers, balancing traffic by cluster, geography, and traffic type. Incoming requests pass through several layers of processing—load balancers, filters, decision engines, logging, and metric collection—before the ad decision is made.

The system is highly configurable. For example, a client can target ads only to BMW-driving males within a specific income bracket and zip code. All the relevant data is pre-loaded and pre-sorted to allow ultra-fast retrieval.

We also minimize synchronous service-to-service calls during critical execution paths. Anything that’s not essential to the real-time response is deferred—like analytics post-processing.

Every optimization we implement goes through a solution review, where I’m heavily involved. We validate ideas not just for business logic, but for compatibility with our complex architecture.

Can you share some specific system improvements you’ve led?

Two highlights come to mind.

First, I optimized our data pre-processing pipeline by reworking the algorithm and approach. What previously took 90 minutes now runs in just 3 minutes.

Second, I implemented a real-time tracking system. Every incoming request is evaluated through a series of filters—geolocation, demographics, ad type, and so on. This system can pinpoint campaign underperformance, such as overly narrow targeting or ineffective site selections.

The scale of this was staggering. Processing 70–100 billion daily requests across thousands of campaigns generates tens or even hundreds of trillions of events daily. That required integrating a data aggregation and compression engine capable of handling it all efficiently.

Handling millions of requests per second is an enormous challenge. How did you architect for stability under peak load?

We use a layered architecture. The first layers handle lightweight filtering operations, often in microsecond ranges. That alone eliminates up to 50 billion irrelevant requests.

The remaining requests are pre-filtered before moving to deeper layers. Most heavy operations are asynchronous—precomputed or deferred. We don’t query the database in real time; everything is prepared in advance.

We also rely on ultra-lightweight data formats, carefully balancing network bandwidth, memory usage, and serialization overhead. It’s a constant trade-off between transmission speed and resource cost.

Tell us more about your data compression and aggregation pipeline. How did you reduce bandwidth by such a dramatic margin?

It’s a three-stage pipeline:

Data aggregation — We group similar events to avoid repetition. For instance, if we receive thousands of identical actions, we aggregate them before further processing.
Data packing — We transform the data into numeric sequences with pattern-based encoding. Think of turning “1111” into “4 1”.
Compression — We use Protobuf, a compact, schema-driven binary format, for fast serialization and minimal payload size. It’s interoperable across languages and services. We also wrote custom Protobuf handlers to extend its efficiency.

We built a dedicated library for this entire process—compressing on one end, decompressing on the other. Without this system, we’d be processing about 90 GB/sec; now it’s around 1 GB/sec.

These are not beginner-level problems. Did you always plan to work on high-performance systems?

Not really. I come from an engineering background, and I’ve always been fascinated with how machines work under the hood.

Over time, I became more interested in what’s under the surface of business logic—how to make systems faster, smarter, more efficient. That curiosity naturally led me into high-load system design.

Illumin is a perfect fit for that interest. The more complex and large-scale the system, the more interesting the technical challenges become.

My field is not just about technical skills—it’s also about trade-offs. We’re constantly weighing speed vs system load, and finding the best balance.

You’ve also judged several hackathons, like Raptors AI-Powered Mental Wellness Support Chatbot and IAHD Lifestyle Evolution. What do you look for when selecting winners?

At hackathons, you get to see radically different projects and ideas—some of which you wouldn’t even think of yourself. It broadens your perspective and often leads to new insights. When evaluating participants, I focus on whether the project is technically sound. It’s important for me to see that the participant didn’t just put together a polished presentation, but actually dove into building a real product.

I also look closely at the architectural approach. Even though hackathons are all about building quickly under tight deadlines, you can still tell whether a team considered how their solution would work in the real world and whether it could scale.

Another key factor is code quality. Sometimes a participant delivers a working prototype, but the code underneath is a mess. Other times, the MVP might not be complete, but the structure clearly shows the developer’s underlying vision.

And of course, I evaluate creativity. It’s genuinely valuable when you see a project and realize the team didn’t just copy an existing solution, but tried to rethink it and add something of their own.

How to Build a Stable and Resilient High-Load System: An Interview with Vadym Semeniuk from Illumin

How to Build a Stable and Resilient High-Load System: An Interview with Vadym Semeniuk from Illumin

Tell us about your career journey and your experience with high-load systems—especially your current role at Illumin.

How does your work directly impact the effectiveness of your clients’ advertising campaigns?

Processing up to 100 billion requests daily is no small feat. What architectural decisions made this possible?

Can you share some specific system improvements you’ve led?

Handling millions of requests per second is an enormous challenge. How did you architect for stability under peak load?

Tell us more about your data compression and aggregation pipeline. How did you reduce bandwidth by such a dramatic margin?

These are not beginner-level problems. Did you always plan to work on high-performance systems?

You’ve also judged several hackathons, like Raptors AI-Powered Mental Wellness Support Chatbot and IAHD Lifestyle Evolution. What do you look for when selecting winners?

Subscribe

Related articles

About us

Quick Links

Latest

Subscribe