Pattern-efficient exploration of trade-offs with parallel anticipated hypervolume enchancment
What the research is:
q-Expected Hypervolume Improvement (qEHVI) is a new, sample-efficient method for optimizing multiple competing, expensive black box functions. Traditional methods of multi-objective black box optimization include evolutionary strategies that are robust and can efficiently generate a large amount of candidate designs in order to evaluate the true functions in parallel. However, they require many evaluations to arrive at Pareto’s optimal compromises.
In the case where evaluating the targets is expensive, sample efficiency is critical. In this case, Bayesian optimization is commonly used to evaluate designs. Typically, candidates are generated one at a time (e.g. using the expected hypervolume improvement). In addition, candidate generation usually includes the numerical optimization of a detection function, which in existing approaches often does not provide gradients.
In this thesis we propose a new acquisition function for multi-objective Bayesian optimization, which enables 1) the parallel or asynchronous generation of several candidates with the correct spread of uncertainty over the candidate points, 2) quickly generates candidates with exact gradients, 3) the state gives – prior art optimization performance, and 4) has desirable theoretical guarantees of convergence.
qEHVI has several use cases on Facebook. For example, it’s used to tweak parameters in Instagram’s recommendation systems, where it enables product teams to understand the optimal tradeoffs between user intervention and CPU usage, and identifies guidelines that lead to a simultaneous improvement in both goals. qEHVI was also used to optimize the reward functions for the bandit contextual algorithms and to determine the video compression rates at the time of upload for Facebook and Instagram. In this way we can find the optimal tradeoffs between the quality of the video upload and the reliability, which has resulted in an improved quality of service.
How it works:
With objective optimization, there is usually no single best solution. Rather, the goal is to identify the Pareto optimal solutions such that improving one goal worsening another.A natural measure of the quality of a Pareto limit in the result space is the hypervolume, which is dominated by the Pareto limit and bounded from below by a reference point. Without loss of generality, we assume that the goal is to maximize all goals. The benefit of a new candidate is its hypervolume improvement, that is, the volume that is exclusively dominated by the new point in the result space that corresponds to the candidate (and not by the existing Pareto limit). The hypervolume improvement is usually not rectangular, but it can be calculated efficiently by dividing the unnamed space into disjoint hyper rectangles.
To generate candidates in parallel, we compute the improvement in the common hypervolume across several new points using the inclusion-exclusion principle to compute the volume of the union of the overlapping hyperrectangles. Since we do not know a priori the target values for a new candidate point, we integrate our uncertainty about the unobserved target values that our probabilistic substitute model delivers (typically a Gaussian process) and use the expected improvement in hypervolume compared to the new candidate points as our acquisition function.
Why it matters:
The parallel generation and evaluation of designs is important for a fast end-to-end optimization time. For example, when optimizing the hyperparameters of machine learning models, you can often evaluate many hyperparametric settings in parallel by spreading the scores across a cluster of machines. Because of the high cost of evaluation, generating high quality candidates is critical. In many existing methods, the numerical optimization to find the maximizers of the acquisition function is very slow due to the lack of gradient information. Our acquisition function is differentiable and enables gradient-based optimization and thus faster convergence and better candidates. In addition, the calculation can be extremely parallelized: The acquisition function has a constant temporal complexity with infinite cores and can be calculated efficiently in many practical scenarios by using the GPU acceleration. We empirically show that our acquisition function achieves a state-of-the-art optimization performance for a large number of benchmark problems.
In addition, we offer theoretical convergence guarantees to optimize the acquisition function. Improving sample efficiency is important to accelerate current initiatives ranging from ranking systems to AutoML, material design to robotics, and to solve new optimization problems that require expensive and / or time-consuming evaluations of black box functions.
Read the full paper:
Differentiable expected hypervolume improvement for parallel Bayesian optimization with multiple objectives
Check out our open source implementations:
qEHVI is available as part of Ax, our open source library for adaptive experimentation. The underlying algorithm is implemented in BoTorch, and researchers in Bayesian optimization can find implementation details there.