Articles

Maximize Profit - Long-Tail Merchandising That Works

Alexander Granovskiy

29 Jan 2025 • 10 min read

The e-commerce merchandising problem

In every online store, the core merchandising challenge is deciding what to show above the fold and how to order products so that we:

maximize sales now (exploitation)
do not kill future sales by never giving new or low-volume products a chance (exploration)
avoid false conclusions from small samples

This problem becomes especially obvious when you have a true long tail - a very large catalog where each product type contains many low-volume items. In that setting, “good merchandising” is not about picking a handful of favorites. It is about finding a reliable way to surface the right products from a huge set, without manual pinning and without relying on intuition.

In other words, the point of this article is to describe a practical merchandising and ranking approach that works specifically for the long tail within each product type: it gives new and rarely purchased products a fair chance to appear, then quickly corrects if demand is not confirmed.

It is worth calling out what typically fails in practice. Some teams rely on manual “bestseller” lists and pin specific products to fixed positions. That often freezes the page around internal opinions and stops the system from learning. Other teams build feature-heavy models that look sophisticated, but with sparse per-product data they frequently turn into “noise in - noise out,” especially when traffic is polluted by bots and tracking artifacts. A more reliable direction is uncertainty-aware ranking (Bayesian and bandit-style thinking), which accepts a hard truth of e-commerce long tail: most products do not have enough clean history to deserve a single deterministic position forever.

My approach is a practical synthesis that keeps what works and removes what breaks in production. I prioritize signal quality over signal volume, use a strict no-weights cascade of strong signals, structure the page as zones and shelves with simple controlled randomness, and compute everything in a moving window of roughly 100 days.

There is no universal, deterministic sorting algorithm that is always “right.” Demand shifts, promotions change behavior, prices and stock change conversion, content improves, and seasonality regularly reshapes the landscape.

My rule from practice: simpler is almost always better

One strong takeaway from real e-commerce work is that the best practical models usually rely on one or two truly strong signals.

A common mistake is trying to build a “smart” model from dozens of factors. In practice that often makes the model worse.

Noise in - noise out

If weak, dirty, unstable signals go into the model, the output is not “intelligence” - it is reliably produced noise.

weak features often have low statistical reliability
noise masks real patterns
overfitting to accidental correlations becomes likely, especially with sparse per-product data
explainability drops and debugging becomes painful

So I keep the model deliberately simple and focus on data quality.

Filtering matters more than volume

Filtering is critical. Even if we discard a meaningful share of events, we reduce the probability of false alarms. The system learns from cleaner observations of real shoppers rather than from random bot activity and tracking artifacts.

The approach: zones + bandit logic + controlled randomness

I do not build a single, global sort order. Instead, I build the page as a composition of zones.

The screen (and then the rest of the listing) is split into zones.
Each product gets a probability score based on observed behavior, using bandit-style thinking that accounts for uncertainty.
Products are assigned into zones based on that score.
Within a zone, order is randomized in a simple way so the ranking does not freeze and continues collecting fresh data.

This turns the listing into a policy: above-the-fold must favor proven demand, while lower zones must reserve space to explore.

Why naive probability fails

The most obvious metric is conversion:

p_naive = purchases / views

It looks reasonable, but in real merchandising it fails because it ignores uncertainty.

A simple example

Product A:

views = 100
purchases = 10
p_naive = 10/100 = 0.10

Product B:

views = 10
purchases = 1
p_naive = 1/10 = 0.10

Naively, they look equal. Statistically, they are not.

Product A has many observations. We have decent confidence its true probability is around 10 percent.
Product B has few observations. One purchase out of ten could mean the true probability is 5 percent, 10 percent, 20 percent - we simply do not know yet.

If you rank by p_naive alone, the system either:

becomes overconfident on tiny samples, or
freezes on incumbents that already have traffic, or
oscillates unpredictably if you try to compensate with ad-hoc rules

The correct framing is: we need a probability estimate that reflects both the signal and the uncertainty of that estimate.

Bandit-style probability scoring

1) The signals I use and why

My system uses three behavioral signals:

purchases
product page views
add-to-cart

I intentionally do not use listing impressions as the primary exposure metric.

Reason 1: accurate listing-impression counting is messy. The same product can appear in many different listings, blocks, filters, and search results, turning the metric into a noisy engineering problem.

Reason 2: traffic quality. One of the biggest challenges today is separating real shoppers from bots. Bots often hit product pages directly (search crawlers, scrapers) and do not behave like shoppers.

So I count product page views only when the pageview came from an internal listing or internal search. Direct and external landings on a product page (including search engine robots) should not drive ranking.

In addition, I apply session-level filters (for example, excluding known IPs, geography constraints) to keep the data closer to real buyer behavior.

I also use two simple, practical noise filters:

if a session views too many products (for example, more than 40), I exclude it from statistics. This often indicates scanning, scraping, or unnatural behavior.
I exclude one-off views (when a product was viewed only once). That signal is too weak and often pure noise.

2) My no-weights cascade

I do not use weights or blended formulas. Weighting makes the model more complex and creates room for fragile tuning.

Instead, I use a strict cascade by signal strength:

If a product has purchases in the window, purchases are the primary signal.
If there are no purchases but there are add-to-cart events, add-to-cart becomes the primary signal.
If there are only qualified product page views, views represent interest.
If there are no signals at all, products fall into a zero-signal bucket and are shown in random order.

This avoids mixing funnel levels into a single number. A product either proved itself with purchases, showed intent via cart, or at least collected interest via qualified views. Products with no signals do not contaminate the ranking.

3) Uncertainty-aware scoring (the intuition)

The key benefit of bandit-style thinking is that it naturally gives a chance to products with limited history while still correcting quickly when the signal does not hold.

This “uncertainty-aware” behavior is why Bayesian ranking approaches are widely used in large e-commerce. See a real production example: https://www.aboutwayfair.com/tech-innovation/bayesian-product-ranking-at-wayfair, and a practical bandit overview: https://medium.com/@FunCorp/features-of-practical-use-of-various-algorithms-of-the-multi-armed-bandit-4da26d343693.

Fast lift and fast cool-down for new and low-volume products

A practical effect of this approach is that it can lift a product quickly on a small but strong signal - and cool it down just as quickly if that lift is not confirmed.

For example, a product might be viewed once and purchased once. That is a strong early signal, so it can get a quick bump and start receiving exposure - which is crucial for new products and low-volume products.

But the system self-corrects fast: if the product is shown several more times and is not purchased, its estimate cools down and it drops. The chance comes fast, but staying at the top requires confirmation by subsequent behavior.

This solves a constant problem: how to expose new products and long tail without manual merchandising, while not polluting the top of the listing for long periods due to a random early success. Practical production behavior of bandit methods is discussed here: https://medium.com/@FunCorp/features-of-practical-use-of-various-algorithms-of-the-multi-armed-bandit-4da26d343693.

Product type vs browsing category

In e-commerce it is important to separate two concepts that are often mixed:

Product type. This is the core, unambiguous classification. A product has exactly one type - it cannot be two types at the same time.
Browsing category. This is navigation. A product of one type can appear in multiple browsing categories, because categories are built for different browsing and marketing scenarios.

My algorithm uses product types as the ranking unit, not browsing categories and not abstract groupings. Type is clearer and more stable: I compare “apples to apples” within a type.

Browsing categories are a navigation layer, not the foundation of statistical ranking.

Why I do not average across groups

Per-product statistics are usually sparse. Many products sell rarely, and traffic is uneven.

A tempting idea is to average signals across groups (brand, price band, category) and apply the group estimate back to each product. In merchandising, that often produces wrong conclusions:

groups mix different intents, prices, and traffic quality
one strong product can dominate a group and hide weak ones
a group can look good statistically while a specific product behaves differently

So I keep the system product-level and handle sparsity through uncertainty-aware logic plus traffic management (zones and randomness).

Zoning the page

A zone is a controlled block of the listing with a clear purpose. For example:

Zone 1 (above the fold): 6-12 products, focused on proven demand
Zone 2: leaders plus a slice of promising products
Zone 3: a broader mix
Zone 4: controlled exploration for long tail

Zones can also enforce practical constraints: stock, margin, exclusions, diversity (not showing too many near-duplicates), and so on.

Simple randomness inside zones

Even with a good zone composition, a fixed order creates problems: top positions capture all exposure, lower positions do not collect data, and the system stops learning.

In my implementation the randomness is simple: within a block (or within equal-ranking conditions) I shuffle the order randomly. No anchors, no extra constraints, no additional sampling inside the zone.

This randomness provides:

natural rotation
data collection for items below the very top
protection from “eternal number one” behavior

Moving time window

To avoid dragging old history and to adapt automatically to current demand, I compute statistics only over the last N days.

Each day the window shifts by one day:

the oldest day drops out
a new day enters
counts are recomputed
scores and the listing update

This keeps the model current and gradually washes out one-time spikes.

Why a ~100-day window is both statistically and operationally good

In practice, a quarter - about 90 days - is a very good smoother. It is long enough for stable statistics and short enough to react to changes in demand.

I use N = 100 days so the window stays “floating” and is not hard-aligned to quarter boundaries. The extra 10 days slightly blur quarter effects and avoid a rigid quarterly rhythm while keeping the same smoothing intent.

A ~100-day window is also convenient technically. With this horizon, the full event history needed for ranking can be stored and aggregated inside the e-commerce platform database.

The volumes stay manageable - typically millions of event rows, not tens or hundreds of millions. Modern databases handle this easily, so you can keep data hot, recompute aggregates fast, and avoid building heavy external analytics just to rank products.

Practical details from my production algorithm

Below are concrete engineering decisions from my live implementation that make the approach stable and deployable.

1) I only rank what makes sense to sell

Before ranking, I exclude products that should not enter the listing:

out of stock
zero or invalid price
negative margin

2) I reduce noise at the source

I only count product page views that came from internal listing or internal search.
I exclude sessions that look like scanners (for example, more than 40 product views).
I exclude one-off views.

This reduces false alarms and keeps the input signal cleaner.

3) The cascade reflects the funnel

Purchases are stronger than cart intent, and cart intent is stronger than simple interest. Products with no signals are shown randomly.

4) The practical score ordering

Within each product type, my sorting priority matches the cascade:

purchases
add-to-cart
qualified views

5) Shelves, type priority, and controlled randomness

To keep the listing structured (not a single endless sort), I use two layers:

within each product type, products are grouped into fixed-size shelves (for example, 12 items)
product types can be prioritized manually (type priority) if the business needs to push certain groups

Then the overall listing is built as a mix of shelves from different product types, with randomness added inside equal conditions. This provides rotation, data collection, and protection from frozen top positions.

Why I oppose manual pinning of “bestsellers”

I intentionally avoid the approach where someone manually declares “bestsellers” or pins specific products to fixed positions.

That manual optimization is usually not data-driven. It is often based on subjective beliefs: “we need to sell this product,” “let’s push that,” “this should be on top.” But commerce works the other way around.

The goal is not to “sell a product at any cost.” The goal is to help the customer solve their problem. That means surfacing what people actually choose and buy right now, within the relevant product type and audience behavior.

Manual pinning commonly leads to:

the listing stops learning because pinned positions absorb the exposure
the system loses connection to current demand and seasonality
the top freezes around internal preferences rather than customer needs

My approach is designed so decisions come from data, not tastes. If a product is truly relevant, it rises and stays up based on purchases and intent. If not, it naturally falls.

Result

The combination of zones, simple and strong signals, aggressive filtering, controlled randomness, and a moving window produces a system that:

pushes sales where it matters most (above the fold)
gives new and low-volume products a fair chance to prove themselves
adapts automatically as demand changes
does not freeze into the same order for months

References

1 - Bayesian Product Ranking at Wayfair (About Wayfair Tech Blog) Link: https://www.aboutwayfair.com/tech-innovation/bayesian-product-ranking-at-wayfair What to take from it: a real production example of Bayesian product ranking in a large e-commerce environment, especially how to handle sparse data and uncertainty without freezing the listing.

2 - Easy and effective way to improve sort order algorithm (Medium, @aankitgupta) Link: https://medium.com/@aankitgupta/easy-and-effective-way-to-improve-sort-order-algorithm-87a423118a28 What to take from it: an engineering perspective on improving sorting without overcomplicating the model. It aligns with the principle "simple model plus clean signals."

3 - Features of Practical Use of Various Algorithms of the Multi-Armed Bandit (Medium, FunCorp) Link: https://medium.com/@FunCorp/features-of-practical-use-of-various-algorithms-of-the-multi-armed-bandit-4da26d343693 What to take from it: practical aspects of bandit methods in production and why they fit the problem "give new products a chance, then cool down fast if demand is not confirmed."

View more posts in these categories: Articles Marketing Search & Discovery