Creating a regular for benchmarking probabilistic programming languages
PPL Bench is an open source benchmark framework for evaluating probabilistic programming languages (PPLs) used for statistical modeling. Researchers can use PPL Bench to create their own reference implementations (a number of PPLs are already included) and compare them all in an apple-to-apple comparison. It is designed to provide researchers with a standard for evaluating improvements in PPLs and to help researchers and engineers select the best PPL for their applications.
What is it:
PPLs allow statisticians to write probabilistic models in a formal language. Over the past two decades, the number of PPLs available to researchers and data scientists has exploded. But each of them has its own advantages and disadvantages. Some PPLs limit the range of models they can process, while others are universal languages, meaning that they support any computable probability distribution. Depending on the performance requirements, some PPLs are better suited for different use cases than others. This means that the PPL community needs a standard benchmarking process to measure inference performance.
PPL Bench uses the predictive log probability as a standard measurement. We believe this is the most uniform way to measure the inference accuracy and convergence rates for all types of PPLs, regardless of the representation of the engine or model. PPL Bench also reports on other common metrics used to evaluate statistical models, including effective sample size, R-hat, and inference time.
Why it matters:
As part of the PPL research community, we believe that a standardized mechanism for comparing PPLs will accelerate the development of better and faster programming languages for probabilistic modeling. We hope that the community’s contributions will help grow and diversify the PPL bank and encourage wider industrial use of PPL.
Read the full paper:
PPL Bench: Evaluation framework for probabilistic programming languages
Get it on GitHub: