The Rise of Two-Tower Fashions in Recommender Programs | by Samuel Flender | Oct, 2023

A deep-dive into the most recent know-how used to debias rating fashions

Picture by Evgeny Smirnov

Recommender methods are among the many most ubiquitous Machine Studying functions on the earth at present. Nevertheless, the underlying rating fashions are tormented by numerous biases that may severely restrict the standard of the ensuing suggestions. The issue of constructing unbiased rankers — also referred to as unbiased studying to rank, ULTR — stays one of the vital analysis issues inside ML and remains to be removed from being solved.

On this put up, we’ll take a deep-dive into one specific modeling strategy that has comparatively lately enabled the business to manage biases very successfully and thus construct vastly superior recommender methods: the two-tower mannequin, the place one tower learns relevance and one other (shallow) tower learns biases.

Whereas two-tower fashions have most likely been used within the business for a number of years, the primary paper to formally introduce them to the broader ML neighborhood was Huawei’s 2019 PAL paper.

PAL (Huawei, 2019) — the OG two-tower mannequin

Huawei’s paper PAL (“position-aware studying to rank”) considers the issue of place bias throughout the context of the Huawei app retailer.

Place bias has been noticed time and again in rating fashions throughout the business. It merely implies that customers usually tend to click on on gadgets which might be proven first. This can be as a result of they’re in a rush, as a result of they blindly belief the rating algorithm, or different causes. Right here’s a plot demonstrating place bias in Huawei’s information:

Place bias. Supply: Huawei’s paper PAL

Place bias is an issue as a result of we merely can’t know whether or not customers clicked on the primary merchandise as a result of it was certainly essentially the most related for them or as a result of it was proven first — and in recommender methods we goal to unravel the previous studying goal, not the latter.

The answer proposed within the PAL paper is to factorize the training drawback as

p(click on|x,place) = p(click on|seen,x) x p(seen|place),

Leave a Reply

Your email address will not be published. Required fields are marked *