Auto-placement of advert campaigns utilizing multi-armed bandits
What the research is:
We consider the problem of optimally splitting an advertiser’s budget across multiple spaces when both demand and value are unknown. Imagine an advertiser using the Facebook platform to promote a product. You have a daily budget that you want to spend on our platform. Advertisers want to reach users where they spend their time and spread their budget across multiple platforms like Facebook, Instagram, and others. They want an algorithm that will help them bid on their behalf on the different platforms and they are increasingly relying on automation products to accomplish this.
In this study we model the placement optimization problem as a stochastic bandit problem. In this problem, the algorithm participates in k different auctions, one for each platform, and has to determine the correct bid for each of the auctions. The algorithm receives an overall budget B (e.g. the daily budget) and a time horizon T over which this budget is to be spent. At each time step, the algorithm should determine the bid that it associates with each of the k platforms that are entered into the auctions for the next requests on each of the platforms. At the end of a round (ie a series of requests), the algorithm sees the total reward (e.g. number of clicks) and the total budget that was used on each of the different platforms. Based on this exact history, the algorithm should determine the next set of bid multipliers that it needs to place. The goal of the algorithm is to maximize the advertiser’s total value with the specified budget on all k platforms.
How it works:
This problem can be addressed using a bandit model called bandits with budgets. In this article we propose a modified algorithm that works optimally in the regime when the number of platforms k is large and the total possible value is small in relation to the total number of games. Online advertising data exhibits this particular behavior where the budget spent and the total value the advertiser receives are much lower due to the size of the competitive pool compared to the total number of auctions they participate in. Thus, our algorithm is a significant improvement over previous work, which usually focuses on the regime in which the total possible total value is comparable to the number of time steps.
The key idea of the algorithm is to modify an approach based on primal-dual in previous work  that can handle multiple platforms. In particular, we derive a new optimization program for each time step, the optimal solution of which gives us the bid multiplier that must be placed at each time step. Previous work  Typically, to solve an optimizer, a rounding step must also be performed. However, this rounding step only works well if the possible optimal value is at least √T and therefore the assumption that the optimal value is comparable to the number of time steps is unavoidable. In this work, however, we rely on the property of this linear program  and show that for the special case of multi-platform bid optimization, the optimal solution is already integrated and we therefore do not need a rounding step. This is the key idea that leads to an optimal regret guarantee.
We use logged data to show that this algorithm works well in practice with desirable properties such as a consistent budget consumption and little regret. We compare it to previous work and other heuristics used in industry. We show that the proposed algorithm is actually superior to all of these algorithms.
Why it matters:
On the corporate side, this study offers potential benefits for advertisers, users and platforms. Automated products that do much of the targeting, placement, and creative optimization on behalf of advertisers and their adoption are rapidly increasing in the number of larger companies. The main challenges with these automated products are scalability and budget management. The number of possible combinations explodes exponentially, while the total budget provided by the advertiser remains roughly the same. This research provides scalable and simple algorithms that can help us create such automated solutions by automatically processing the bids in real time in the auction mechanism. The bid is one of the main levers that advertisers use to adjust the display of ads to the desired behavior. However, this is typically done in black box form as they do not have the necessary data to make optimal bid decisions. However, the advantage of using the proposed algorithm is that the bidding is close to optimal, which results in the greatest value for your spending. This has advantages both for the individual advertiser and for the entire ecosystem.
On the research side, “bandits with budgets” was mainly investigated as a mathematical problem in the theoretical computer science / operations research community. This research bridges the gap between theory and practice of these algorithms – by applying them to a big important problem. On the way to this application, we are also creating a new, simpler algorithm that is optimal in the parameter ranges desired in the application.
We hope that in the future our paper will open the door to newer uses both inside and outside of online advertising for this extremely general and versatile model. We believe that this work offers enormous research potential for the development of new algorithms as well as for influencing core business problems.
Read the full paper:
Stochastic bandits for cross-platform budget optimization in online advertising
 – Badanidiyuru, Ashwinkumar, Robert Kleinberg and Aleksandrs Slivkins. “Bandits with backpacks.” 2013 IEEE 54th annual symposium on fundamentals of computer science. IEEE, 2013.
 – Sankararaman, Karthik Abinav and Aleksandrs Slivkins. “Combinatorial half-bandits with backpacks.” International Conference on Artificial Intelligence and Statistics. PMLR, 2018.
 – Avadhanula, Vashist et al. “About the tightness of an LP relaxation for a rational optimization and its applications.” Operations Research Letters 44.5 (2016): 612-617.