Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 23
constants). By combining the Stan state-of-the-art No-U-Turn sampler with bridgesampling,
researchers are provided with a general purpose, easy-to-use computational solution to the
challenging task of comparing complex Bayesian models.
As practical advice, we recommend to keep the following four points in mind when using
the bridgesampling package (see also Gronau et al. 2017c, 2018). First, one should always
check the posterior samples carefully. A successful application of bridge sampling requires a
sufficient number of representative samples from the posterior distribution. Thus, it is im-
portant to use efficient sampling algorithms and, in case of MCMC sampling, it is crucial
that researchers confirm that the chains have converged to the joint posterior distribution.
In addition, researchers need to make sure that the model does not contain any discrete pa-
rameters since those are currently not supported. This may sound more restrictive than it is.
In practice the solution is to marginalize out the discrete parameters, something that is often
possible. Note the similarity to Stan which also deals with discrete parameters by marginal-
izing them out (Stan Development Team 2017, section 15). Furthermore, as demonstrated
in the examples, for conducting model comparisons based on bridge sampling, the number of
posterior samples often needs to be an order of magnitude larger than for estimation. This
of course depends on a number of factors such as the complexity of the model of interest,
the number of posterior samples that one usually uses for estimation, the posterior sampling
algorithm used, and also the accuracy of the marginal likelihood estimate that one desires to
achieve.
Second, one should always assess the uncertainty of the bridge sampling estimate. In case the
uncertainty is deemed too high, one can attempt to achieve a higher precision by increasing
the number of posterior samples or, in case method = "normal", by using the more sophis-
ticated method = "warp3" instead (see the third point below). Users of the bridgesampling
package have different options for assessing the estimation uncertainty. In our opinion, the
“gold standard” may be to obtain an empirical uncertainty assessment by repeating the bridge
sampling procedure multiple times, each time using a fresh set of posterior samples. This ap-
proach allows users to assess the uncertainty directly for the quantity of interest. For instance,
if the focus is on computing a Bayes factor, users may repeat the following steps: (a) obtain
posterior samples for both models, (b) use the bridge_sampler function to estimate the log
marginal likelihoods, (c) compute the Bayes factor using the bf function. The variability of
these Bayes factor estimates across repetitions then provides an assessment of the uncertainty.
For certain applications, this approach may be infeasible due to computational restrictions. If
this is the case and method = "normal", we recommend to use the approximate errors based
on Fr
¨
uhwirth–Schnatter (2004) which are available through the error_measures function.
As mentioned before, we have found these approximate errors to work well for method =
"normal", but not for method = "warp3" which is the reason why they are not available for
the latter method. Alternatively, one can also assess the estimation uncertainty by setting
the repetitions argument to an integer larger than one. This provides an assessment of the
estimation uncertainty due to variability in the samples from the proposal distribution, but
it should be kept in mind that this does not take into account variability in the posterior
samples.
Third, one should consider whether using the more time-consuming Warp-III method may
be beneficial. The accuracy of the estimate is governed not only be the number of samples,
but also by the overlap between the posterior and the proposal distribution (e.g., Meng and
Wong 1996; Meng and Schilling 2002). The bridgesampling package attempts to maximize