Enhancing experiment precision with machine studying

The challenge of noise in experiments

Experimentation is a central part of data-driven product development, but in practice the results of experiments can be too imprecise to improve decision-making. One possible response is to reduce the statistical noise by simply doing larger experiments. However, this is not always desirable or even feasible. This begs the question of how we can make better use of the data we have and get sharper, more accurate experimental estimates without adding more people to the test.

In a collaboration between Meta’s Core data science and Experimentation Platform teams, we developed a new methodology to advance this problem that both provides formal statistical guarantees and is scalable enough to be put into practice. The work, described in detail in our NeurIPS paper, enables general machine learning (ML) techniques to be used in conjunction with experimental data to significantly increase the accuracy of experimental estimates compared to other existing methods.

How it works

Our MLRATE (Machine Learning Regression-adjusted Average Treatment Effects) algorithm comprises two main steps. First, we train a model that predicts the experimental outcome of interest given a set of pre-experimental covariates. Second, we use these predictions as a control variable in a linear regression. The coefficient of the treatment effect estimate is our reduced-variance average treatment effect estimate.

In the first step, we use sampling so that the predicted results for each observation are generated by a model trained on data that does not include that observation. This allows us to use a broad class of ML methods as the first step, and gives us the flexibility to choose the model that best predicts the results. The ML method in question may even be asymptotically skewed and not even be truthful in large samples without affecting the validity of our estimator.

In the second step, we treat the predictions from the first step as a control variable in a linear regression. This form of linear regression fitting is relatively common when analyzing experimental data (e.g. Lin [2013], Deng et al. [2013]). The contribution of our paper is to show how this methodology can be generalized to account for control variables that are themselves the result of a potentially complex ML algorithm.

Empirical results

To quantify the variance reduction gains that can be expected from MLRATE in practice, we implemented it in A / A tests for a set of 48 outcome metrics that are commonly monitored in meta-experiments. Using either gradient-enhanced decision trees or net elastic regression for the ML prediction step, we find that MLRATE, on average, has over 70% lower variance than the simple difference-in-mean estimator for these metrics, and about 19% lower variance than the usual univariate method, which only adjusts the values ​​of the result before the experiment.

Alternatively, to achieve the same accuracy as MLRATE, the traditional mean estimator would require an average of more than five times the sample sizes across all metrics, and the univariate linear regression technique would require about 1.6 times the sample size. The above figure shows the distribution of the confidence interval widths at the metric level relative to the univariate fit case. There are significant differences in performance between the metrics: for some, the ML regression fit gives only modest gains compared to the univariate fit; for others, it drastically reduces the confidence intervals. Given the variety of metrics in analysis, this is natural: some, especially binary or discrete results, can benefit from more sophisticated predictive modeling, while others can benefit from simple linear models.

Why MLRATE is important in practice

Some features of this methodology make it relatively easy to implement in practice. First, the formulas used to compute treatment effect estimates and confidence intervals are no more complex than the traditional linear regression fitting. Second, the most popular standard ML methods can be used for the prediction phase, as long as the covariates used are pre-experiments. Finally, MLRATE does not require any investment in ML modeling for every single experiment to work well. Once predictive models have been trained on an interesting outcome, they can be used for many experiments so that the cost of ML training does not scale with the number of experiments.

If you are dealing with the problem of excessive noise in your experiments and can construct good predictors for the result of interest, MLRATE can be a helpful new tool for reducing variance. Depending on the metric, it can even make the difference whether experiments are possible or not. For more information, see our NeurIPS paper.

Comments are closed.