Inside Arpeely’s Real-Time Feature Store for Ad Personalization

Alexander Patino
Aug 25
5 min read

If you want to see Roee Sheffer's presentation at the Speed & Scale: Engineering for Performance meetup, you can check it out here.

Written by Alexander Patino at Aerospike

Arpeely is in the business of decisions. Fast ones. Every day, the company makes billions of them across programmatic ad auctions, each one a split-second choice of who to target, what to show, and how much to bid. Behind that speed is a data infrastructure built for real-time machine learning (ML) inference, personalization, and reliability at massive scale.

“We try to operate incredibly quickly and at a massive scale,” said Roee Sheffer, VP of R&D at Arpeely, who walked through the system that powers all this. It centers around a high-performance feature store built on Aerospike, one that serves more than 300 features per request, under 3ms each.

From Mad Men to millisecond markets

Forget billboards. This is performance marketing in its purest, fastest form. “We’re in the AdTech space. Not like what you saw in Mad Men, but the real-time, algorithmic version,” Sheffer said.

At Arpeely’s scale, every incoming auction triggers three decisions:

Where to advertise
What ad to show
How much to bid

“The first thing: Where do I want to advertise? To which users? In which app? At what time?The second thing is: What to advertise? I have a set of clients to choose from. For the selected client, I have to choose what ad or experience I want to display. Then I need to decide how much to actually bid in the auction,” said Sheffer.

And each bid isn’t just reactive; it’s calculated against expected value. “Say I expect to make a dollar in return, I might bid a dollar, or maybe less, so I make a profit... It’s also a bit of game theory...” Sheffer explained.

Billions of decisions, measured in milliseconds

Arpeely sees more than a billion auctions every day. “We’re talking over a million requests per second. Actually, more, depending on the day. And within all those requests, we’re dealing with about a billion unique users across a 30-day period. And remember, every auction closes in around 100 milliseconds.”

Those 100 milliseconds include model scoring, feature fetching, and the full bidding decision. And it needs to happen at scale. “If I need a feature from a database, it needs to return results in less time, ideally under 10 milliseconds.”

Personalization powered by a real-time feature store

Decisions are model-driven, but the models are only as good as the features they ingest. Arpeely built a real-time feature store to serve hundreds of features per request, derived from both live and historical user behavior. “Our real-time feature store serves about 300 real-time features for our models during bidding that help us make decisions,” Sheffer said.

Those features include behavioral data such as ad history, purchase behavior, and behavioral tags. “I’d like to know what ads have already been shown to a certain user,” Sheffer said. “Is the user a gamer, are they a shopper, have they ever purchased something before? In the end, if a user ever made a purchase, that’s a pretty significant feature.”

Just as important as what the feature store returns is how quickly it does so. “I want a feature store with a good UI, a good API, that’s fast and reliable,” Sheffer said. “It needs to answer millions of queries per second with low latency.”

Machine learning that adapts in real time

Arpeely’s decisioning runs on CatBoost, an open-source machine learning library that implements a gradient boosting algorithm, and deep learning models, all optimized for real-time inference.

The team doesn’t operate in long model cycles; every change goes through live A/B testing, giving them fine-grained feedback on feature effectiveness and model performance. “Almost every change we make ends up as an A/B test, from the color of a button to which feature or model we use,” Sheffer said. “We compare performance. We’re a data-driven company, always looking to improve company performance incrementally.” This creates a tight feedback loop: new features are created, tested, and deployed fast, with minimal engineering overhead. “We wanted to reduce the time it takes from idea to implementation.We wanted to allow data scientists to do this independently,” he explained. “The data scientist can go in, create their own features, and make them available for training and for production.”

The feature store behind Arpeely’s millisecond decisions

Arpeely’s decision to use Aerospike wasn’t about theory, but about hitting performance benchmarks that most databases couldn’t touch. With more than 1 billion auctions per day, 300+ real-time features, and latency budgets under 10 milliseconds, its infrastructure needed to be purpose-built for high-throughput, low-latency operations at scale.

Aerospike not only met that requirement, it beat it. “Today, we run a cluster,” Sheffer said. “It handles 100,000 writes per second, 2.4 million reads per second, time series data for users, retrieved in under 3 milliseconds. Our latency right now is under 3 milliseconds for 99% of queries.” That sub-3ms latency, even under extremely high read volume, makes it possible to serve hundreds of features per user, on demand, without cutting into model execution time.

But raw performance isn’t the only reason Arpeely standardized on Aerospike. It also reduced the engineering complexity around scaling. “The beauty of it is that it scales really well,” Sheffer said. “Aerospike is really efficient in terms of CPU and RAM consumption. You don’t need a massive cluster to do something that’s really impressive.”

That efficiency translated into cost savings and architectural simplicity. Arpeely didn’t need to overprovision to hit its service level agreements; the performance curve was flat enough to scale predictably as the business grew. “The system can take a lot more load than we’re currently pushing it,” Sheffer said. “We’re not close to its limits. That’s a good thing.”

It also gave the team confidence to push features into production faster, knowing that latency wouldn’t become a bottleneck during real-time inference. From an operational standpoint, Aerospike also reduced the load on Arpeely’s site reliability and infrastructure engineers. The system ran predictably under pressure, and maintenance overhead was low. “It’s very stable. We’ve had very few issues,” Sheffer said. “It’s been really good for our production use cases.”

Why Arpeely’s ML models run faster and smarter

All of this made Aerospike not just a fast database, but the foundation of Arpeely’s real-time ML architecture. Without it, the feature store wouldn’t be fast enough to serve the models. And without those models, Arpeely couldn’t bid with the speed or intelligence its business demands.

For Arpeely, personalization isn’t optional; it’s the whole product. Every auction is a chance to make a better choice. To do that, every part of the system, from feature store to ML model to infrastructure, has to move fast. “We’re only limited by the data we have, and how fast we can access it,” Sheffer said. “With Aerospike, that limit keeps expanding.”