Excessive-dimensional Bayesian optimization with sparsity-inducing priors

This work was done in collaboration with Martin Jankowiak (Broad Institute of Harvard and MIT).

What the research is:

Bayesian optimization with sparse axis-aligned subspaces (SAASBO) is a new sample-efficient method for black box optimization, which is expensive to evaluate. Bayesian optimization (BO) is a popular approach to black box optimization, with machine learning (ML) hyperparameter tuning being a popular application. While they have had great success with low-dimensional problems with no more than 20 adjustable parameters, most BO methods perform poorly on problems with hundreds of adjustable parameters when a small evaluation budget is available.

In this thesis we propose a new sample-efficient BO method that shows convincing performance in black box optimization problems with up to 388 adjustable parameters. In particular, SAASBO has proven itself in difficult real-world problems where other BO methods have difficulties. Our main contribution is a new Gaussian Process (GP) model that is better suited for high-dimensional search spaces. We propose a sparsity-inducing function that leads to a GP model that quickly identifies the most important adjustable parameters. We find that our SAAS model avoids overfitting in high-dimensional spaces and enables sample-efficient high-dimensional BO.

SAASBO already had several use cases on Facebook. For example, we used it for multi-objective Bayesian optimization for finding neural architectures where we want to examine the trade-off between model accuracy and latency on the device. As we show in another blog post, the SAAS model achieves much better model fits than a standard GP model for both accuracy and latency goals on the device. In addition, the SAAS model has also shown encouraging results for modeling the results of online A / B tests, where standard GP models sometimes struggle to make good fits.

How it works:

BO with hundreds of adjustable parameters presents several challenges, many of which stem from the complexity of high-dimensional spaces. A general behavior of standard BO algorithms in high dimensional spaces is that they tend to prefer highly uncertain points near the domain boundary. Since the GP model is usually the most uncertain there, it is often a poor choice that leads to over-exploration and poor optimization performance. Our SAAS model places sparse priors on the inverse length scales of the GP model, combined with a global shrinkage that controls the overall economy of the model. This results in a model with most of the dimensions “off”, which avoids overfitting.

One appealing quality of the SAAS priorities is that they are adaptable. As we collect more data, we can gather evidence that additional parameters are important, which allows us to effectively “switch on” more dimensions. This is in contrast to a standard GP model with maximum likelihood estimation (MLE), which generally has non-negligible inverse length scales for most dimensions – since there is no mechanism for regularizing the length scales – which is often true in high-dimensional settings drastic overfitting. We rely on the no-turn U-sampler (NUTS) for inference in the SAAS model because we have found it to significantly exceed the maximum a posteriori (MAP) estimate. In Figure 1 we compare the model fit with 50 training points and 100 test points for three different GP models on a 388-dimensional SVM problem. We see that the SAAS model provides well-calibrated out-of-sample predictions, while a GP with MLE / NUTS shows poor prior overfitting and poor out-of-sample performance.

Figure 1: We compare a GP fit with MLE (left), a GP with a weak prior fit with NUTS (center) and a GP with SAAS a prior fit with NUTS (right). In each figure, mean predictions are shown with dots, while bars represent 95 percent confidence intervals.

Many approaches to high-dimensional BO try to reduce the effective dimensionality of the problem. For example, random projection methods like ALEBO and HeSBO work directly in a low-dimensional space, while a method like TuRBO restricts the area over which the detection function is optimized. SAASBO works directly in high-dimensional space and instead relies on a sparsity-inducing function before the curse of dimensionality is softened. This offers several distinct advantages over existing methods. First, it retains the structure in the input domain – and can therefore take advantage of it – as opposed to methods that rely on random embeddings and run the risk of scrambling them. Second, it is adaptive and shows little sensitivity to its hyperparameters. Third, of course, it can take into account both input and output constraints, as opposed to methods that rely on random embeddings where input constraints present a particular challenge.

Why it matters:

Sample efficient high dimensional black box optimization is an important problem, with ML hyperparameter tuning being a common application. In our recently published blog post about Bayesian multi-objective neural architecture search, we optimized a total of 24 hyperparameters, and using the SAAS model was critical to get good performance. Because of the high cost associated with training large ML models, we want to try as few hyperparameters as possible, which requires the sampling efficiency of the black box optimization method.

We find that SAASBO performs well in demanding real-world applications and outperforms state-of-the-art Bayesian optimization methods. In Figure 2, we see that SAASBO outperforms other methods on a 100-dimensional rover trajectory planning problem, a 388-dimensional SVM-ML hyperparameter tuning problem, and a 124-dimensional vehicle design problem.

Figure 2: For each method, we represent the mean of the best value found for a specific iteration (top row). For each method, we show the distribution over the final approximate minimum as a violin chart, with horizontal bars corresponding to 5 percent, 50 percent, and 95 percent quantiles (bottom row).

Read the full paper:

High-Dimensional Bayesian Optimization with Sparse Axis-Aligned Subspaces

View instructions:


Comments are closed.