bridgesampling: An R Package for Estimating

Normalizing Constants

Quentin F. Gronau

University of Amsterdam

Henrik Singmann

University of Zurich

Eric-Jan Wagenmakers

University of Amsterdam

Abstract

Statistical procedures such as Bayes factor model selection and Bayesian model averag-

ing require the computation of normalizing constants (e.g., marginal likelihoods). These

normalizing constants are notoriously diﬃcult to obtain, as they usually involve high-

dimensional integrals that cannot be solved analytically. Here we introduce an R package

that uses bridge sampling (Meng and Wong 1996; Meng and Schilling 2002) to estimate

normalizing constants in a generic and easy-to-use fashion. For models implemented in

Stan, the estimation procedure is automatic. We illustrate the functionality of the package

with three examples.

Keywords: bridge sampling, marginal likelihood, model selection, Bayes factor, Warp-III.

1. Introduction

In many statistical applications, it is essential to obtain normalizing constants of the form

Z =

Z

Θ

q(θ) dθ, (1)

where p(θ) = q(θ)/Z denotes a probability density function (pdf) deﬁned on the domain

Θ ⊆ R

p

. For instance, the estimation of normalizing constants plays a crucial role in free

energy estimation in physics, missing data analyses in likelihood-based approaches, Bayes

factor model comparisons, and Bayesian model averaging (e.g., Gelman and Meng 1998). In

this article, we focus on the role of the normalizing constant in Bayesian inference; however,

the bridgesampling package can be used in any context where one desires to estimate a

normalizing constant.

In Bayesian inference, the normalizing constant of the joint posterior distribution is involved in

(a) parameter estimation, where the normalizing constant ensures that the posterior integrates

to one; (b) Bayes factor model comparison, where the ratio of normalizing constants quantiﬁes

the data-induced change in beliefs concerning the relative plausibility of two competing models

(e.g., Kass and Raftery 1995); (c) Bayesian model averaging, where the normalizing constant

is required to obtain posterior model probabilities (BMA; Hoeting et al. 1999).

For Bayesian parameter estimation, the need to compute the normalizing constant can usu-

ally be circumvented by the use of sampling approaches such as Markov chain Monte Carlo

(MCMC; e.g., Gamerman and Lopes 2006). However, for Bayes factor model comparison and

BMA, the normalizing constant of the joint posterior distribution – in this context usually

arXiv:1710.08162v3 [stat.CO] 18 Sep 2018

2 bridgesampling

called marginal likelihood – remains of essential importance. This is evident from the fact

that the posterior model probability of model M

i

, i ∈ {1, 2, . . . , m}, given data y is obtained

as

p(M

i

| y)

| {z }

posterior model probability

=

p(y | M

i

)

P

m

j=1

p(y | M

j

) p(M

j

)

| {z }

updating factor

× p(M

i

)

| {z }

prior model probability

, (2)

where p(y | M

i

) denotes the marginal likelihood of model M

i

.

If the model comparison involves only two models, M

1

and M

2

, it is convenient to consider

the odds of one model over the other. Bayes’ rule yields:

p(M

1

| y)

p(M

2

| y)

| {z }

posterior odds

=

p(y | M

1

)

p(y | M

2

)

| {z }

Bayes factor BF

12

×

p(M

1

)

p(M

2

)

| {z }

prior odds

. (3)

The change in odds brought about by the data is given by the ratio of the marginal likelihoods

of the models and is known as the Bayes factor (Jeﬀreys 1961; Kass and Raftery 1995; Etz and

Wagenmakers 2017). Equation (2) and Equation (3) highlight that the normalizing constant

of the joint posterior distribution, that is, the marginal likelihood, is required for computing

both posterior model probabilities and Bayes factors.

The marginal likelihood is obtained by integrating out the model parameters with respect to

their prior distribution:

p(y | M

i

) =

Z

Θ

p(y | θ, M

i

) p(θ | M

i

) dθ. (4)

The marginal likelihood implements the principle of parsimony also known as Occam’s razor

(e.g., Jeﬀerys and Berger 1992; Myung and Pitt 1997; Vandekerckhove et al. 2015). Unfor-

tunately, the marginal likelihood can be computed analytically for only a limited number of

models. For more complicated models (e.g., hierarchical models), the marginal likelihood is

a high-dimensional integral that usually cannot be solved analytically. This computational

hurdle has complicated the application of Bayesian model comparisons for decades.

To overcome this hurdle, a range of diﬀerent methods have been developed that vary in ac-

curacy, speed, and complexity of implementation: naive Monte Carlo estimation, importance

sampling, the generalized harmonic mean estimator, Reversible Jump MCMC (Green 1995),

the product-space method (Carlin and Chib 1995; Lodewyckx et al. 2011), Chib’s method

(Chib 1995), thermodynamic integration (e.g., Lartillot and Philippe 2006), path sampling

(Gelman and Meng 1998), and others. The ideal method is fast, accurate, easy to implement,

general, and unsupervised, allowing non-expert users to treat it as a “black box”.

In our experience, one of the most promising methods for estimating normalizing constants

is bridge sampling (Meng and Wong 1996; Meng and Schilling 2002). Bridge sampling is a

general procedure that performs accurately even in high-dimensional parameter spaces such

as those that are regularly encountered in hierarchical models. In fact, simpler estimators

such as the naive Monte Carlo estimator, the generalized harmonic mean estimator, and

importance sampling are special sub-optimal cases of the bridge identity described in more

detail below (e.g., Fr

¨

uhwirth–Schnatter 2004; Gronau et al. 2017b).

In this article, we introduce bridgesampling, an R (R Core Team 2016) package that enables

the straightforward and user-friendly estimation of the marginal likelihood (and of normalizing

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 3

constants more generally) via bridge sampling techniques. In general, the user needs to provide

to the bridge_sampler function four quantities that are readily available:

• an object with posterior samples (argument samples);

• a function that computes the log of the unnormalized posterior density for a set of model

parameters (argument log_posterior);

• a data object that contains the data and potentially other relevant quantities for eval-

uating log_posterior (argument data);

• lower and upper bounds for the parameters (arguments lb and ub, respectively).

Given these inputs, the bridgesampling package provides an estimate of the log marginal

likelihood.

Figure 1 displays the steps that a user may take when using the bridgesampling package.

Starting from the top, the user provides the basic required arguments to the bridge_sampler

function which then produces an estimate of the log marginal likelihood. With this esti-

mate in hand – usually for at least two diﬀerent models – the user can compute posterior

model probabilities using the post_prob function, Bayes factors using the bf function, and

approximate estimation errors using the error_measures function. A schematic call of the

bridge_sampler function looks as follows (detailed examples are provided in the next sec-

tions):

R> bridge_sampler(samples = samples, log_posterior = log_posterior,

+ data = data, lb = lb, ub = ub)

The bridge_sampler function is an S3 generic which currently has methods for objects of

class mcmc, mcmc.list (Plummer et al. 2006), stanfit (Stan Development Team 2016a),

matrix, rjags (Plummer 2016; Su and Yajima 2015), runjags (Denwood 2016), stanreg

(Stan Development Team 2016b), and for MCMC_refClass objects produced by nimble (de

Valpine et al. 2017).

1

This allows the user to obtain posterior samples in a convenient and

eﬃcient way, for instance, via JAGS (Plummer 2003) or a highly customized sampler. Hence,

bridge sampling does not require users to program their own MCMC routines to obtain

posterior samples; this convenience is usually missing for methods such as Reversible Jump

MCMC (but see Gelling et al. 2017).

When the model is speciﬁed in Stan (Carpenter et al. 2017; Stan Development Team 2016a) –

in a way that retains the constants, as described below – obtaining the marginal likelihood is

even simpler: the user only needs to pass the stanfit object to the bridge_sampler function.

The combination of Stan and the bridgesampling package therefore produces an unsupervised,

black box computation of the marginal likelihood.

This article is structured as follows: First we describe the implementation details of the

algorithm from bridgesampling; second, we illustrate the functionality of the package using a

simple Bayesian t-test example where posterior samples are obtained via JAGS. In this section,

we also explain a heuristic to obtain the function that computes the log of the unnormalized

1

We thank Ben Goodrich for adding the stanreg method to our package and Perry de Valpine for his help

implementing the nimble support.

4 bridgesampling

Basic&Arguments:

samples

log_posterior

data

lb

ub

bridge_sampler()

Basic&Output:

Object&of&class&

"bridge"&or&

"bridge_list"&&

with&log&marginal&

likelihood&estimate(s)

bf()

post_prob()

error_measures()

Approximate&

estimation&error

Bayes&factor

Posteri or&model&

probabilities

Figure 1: Flow chart of the steps that a user may take when using the bridgesampling

package. In general, the user needs to provide a posterior samples object (samples), a func-

tion that computes the log of the unnormalized posterior density (log_posterior), the data

(data), and parameter bounds (lb and ub). The bridge_sampler function then produces

an estimate of the log marginal likelihood. This is usually repeated for at least two diﬀerent

models. The user can then compute posterior model probabilities (using the post_prob func-

tion), Bayes factors (using the bf function), and approximate estimation errors (using the

error_measures function). Note that the summary method for bridge objects automatically

invokes the error_measures function. Figure available at https://tinyurl.com/ybf4jxka

under CC license https://creativecommons.org/licenses/by/2.0/.

posterior density in JAGS; third, we describe in more detail the interface to Stan which enables

an even more automatized computation of the marginal likelihood. Fourth, we illustrate use of

the Stan interface with two well-known examples from the Bayesian model selection literature.

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 5

2. Bridge sampling: the algorithm

Bridge sampling can be thought of as a generalization of simpler methods for estimating nor-

malizing constants such as the naive Monte Carlo estimator, the generalized harmonic mean

estimator, and importance sampling (e.g., Fr

¨

uhwirth–Schnatter 2004; Gronau et al. 2017b).

These simpler methods typically use samples from a single distribution, whereas bridge sam-

pling combines samples from two distributions.

2

For instance, in its original formulation

(Meng and Wong 1996), bridge sampling was used to estimate a ratio of two normalizing con-

stants such as the Bayes factor. In this scenario, the two distributions for the bridge sampler

are the posteriors for each of the two models involved. However, the accuracy of the estimator

depends crucially on the overlap between the two involved distributions; consequently, the

accuracy can be increased by estimating a single normalizing constant at a time, using as a

second distribution a convenient normalized proposal distribution that closely matches the

distribution of interest (e.g., Gronau et al. 2017b; Overstall and Forster 2010). The bridge

sampling estimator of the marginal likelihood is then given by:

3

p(y) =

E

g(θ)

[h(θ) p(y | θ) p(θ)]

E

p(θ|y)

[h(θ) g(θ)]

≈

1

n

2

P

n

2

j=1

h(

˜

θ

j

) p(y |

˜

θ

j

) p(

˜

θ

j

)

1

n

1

P

n

1

i=1

h(θ

∗

i

) g(θ

∗

i

)

, (5)

where h(θ) is called the bridge function and g(θ) denotes the proposal distribu-

tion. {θ

∗

1

, θ

∗

2

, . . . , θ

∗

n

1

} denote n

1

samples from the posterior distribution p(θ |y) and

{

˜

θ

1

,

˜

θ

2

, . . . ,

˜

θ

n

2

} denote n

2

samples from the proposal distribution g(θ).

To use bridge sampling in practice, one has to specify the bridge function h(θ) and the proposal

distribution g(θ). For the bridge function h(θ), the bridgesampling package implements the

optimal choice presented in Meng and Wong (1996) which minimizes the relative mean-squared

error of the estimator. Using this particular bridge function, the bridge sampling estimate of

the marginal likelihood is obtained via an iterative scheme that updates an initial guess of the

marginal likelihood ˆp(y)

(0)

until convergence (for details, see Meng and Wong 1996; Gronau

et al. 2017b). The estimate at iteration t + 1 is obtained as follows:

ˆp(y)

(t+1)

=

1

n

2

n

2

P

j=1

l

2,j

s

1

l

2,j

+s

2

ˆp(y)

(t)

1

n

1

n

1

P

i=1

1

s

1

l

1,i

+s

2

ˆp(y)

(t)

, (6)

where l

1,i

=

p(y|θ

∗

i

) p(θ

∗

i

)

g(θ

∗

i

)

, and l

2,j

=

p(y|

˜

θ

j

) p(

˜

θ

j

)

g(

˜

θ

j

)

. In practice, a more numerically stable version

of Equation (6) is implemented that uses logarithms in combination with the Brobdingnag R

package (Hankin 2007) to avoid numerical under- and overﬂow (for details, see Gronau et al.

2017b, Appendix B).

The iterative scheme usually converges within a few iterations. Note that, crucially, l

1,i

and

l

2,j

need only be computed once before the iterative updating scheme is started. In practice,

evaluating l

1,i

and l

2,j

takes up most of the computational time. Luckily, l

1,i

and l

2,j

can

2

Note, however, that these simpler methods are special cases of bridge sampling (e.g., Gronau et al. 2017b,

Appendix A). Hence, for particular choices of the bridge function and the proposal distribution, only samples

from one distribution are used.

3

We omit conditioning on the model for enhanced legibility. It should be kept in mind, however, that this

yields the estimate of the marginal likelihood for a particular model M

i

, that is, p(y | M

i

).

6 bridgesampling

be computed completely in parallel for each i ∈ {1, 2, . . . , n

1

} and each j ∈ {1, 2, . . . , n

2

},

respectively. That is, in contrast to MCMC procedures, the evaluation of, for instance,

l

1,i+1

does not require one to evaluate l

1,i

ﬁrst (since the posterior samples and proposal

samples are already available). The bridgesampling package enables the user to compute

l

1,i

and l

2,j

in parallel by setting the argument cores to an integer larger than one. On

Unix/macOS machines, this parallelization is implemented using the parallel package. On

Windows machines this is achieved using the snowfall package (Knaus 2015).

4

After having speciﬁed the bridge function, one needs to choose the proposal distribution g(θ).

The bridgesampling package implements two diﬀerent choices: (a) a multivariate normal

proposal distribution with mean vector and covariance matrix that match the respective

posterior samples quantities and (b) a standard multivariate normal distribution combined

with a warped posterior distribution.

5

Both choices increase the eﬃciency of the estimator

by making the proposal and the posterior distribution as similar as possible. Note that

under the optimal bridge function, the bridge sampling estimator is robust to the relative tail

behavior of the posterior and the proposal distribution. This stands in sharp contrast to the

importance and the generalized harmonic mean estimator for which unwanted tail behavior

produces estimators with very large or even inﬁnite variances (e.g., Owen and Zhou 2000;

Fr

¨

uhwirth–Schnatter 2004; Gronau et al. 2017b).

2.1. Option I: the multivariate normal proposal distribution

The ﬁrst choice for the proposal distribution that is implemented in the bridgesampling pack-

age is a multivariate normal distribution with mean vector and covariance matrix that match

the respective posterior samples quantities. This choice (henceforth “the normal method”)

generalizes to high dimensions and accounts for potential correlations in the joint poste-

rior distribution. This proposal distribution is obtained by setting the argument method =

"normal" in the bridge_sampler function; this is the default setting. This choice assumes

that all parameters are allowed to range across the entire real line. In practice, this assump-

tion may not be fulﬁlled for all components of the parameter vector, however, it is usually

possible to transform the parameters so that this requirement is met. This is achieved by

transforming the original p-dimensional parameter vector θ (which may contain components

that range only across a subset of R) to a new parameter vector ξ (where all components are

allowed to range across the entire real line) using a diﬀeomorphic vector-valued function f so

that ξ = f (θ). By the change-of-variable rule, the posterior density with respect to the new

parameter vector ξ is given by:

p(ξ | y) = p

θ

(f

−1

(ξ) | y)

det

J

f

−1

(ξ)

, (7)

where p

θ

(f

−1

(ξ) | y) refers to the untransformed posterior density with respect to θ evaluated

for f

−1

(ξ) = θ. J

f

−1

(ξ) denotes the Jacobian matrix with the element in the i-th row and

j-th column given by

∂θ

i

∂ξ

j

. Crucially, the posterior density with respect to ξ retains the

normalizing constant of the posterior density with respect to θ; hence, one can select a

convenient transformation without changing the normalizing constant. Note that in order to

4

Due to technical limitations speciﬁc to Windows, this parallelization is not available for the stanfit and

stanreg methods.

5

Note that other proposal distributions such as multivariate t distributions are conceivable but are currently

not implemented in the bridgesampling package.

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 7

Table 1: Overview of built-in transformations in the bridgesampling package. l denotes a pa-

rameter lower bound and u denotes an upper bound. Φ(·) denotes the cumulative distribution

function (cdf) and φ(·) the probability density function (pdf) of the normal distribution.

Type Transformation Inv.-Transformation Jacobian Contribution

unbounded ξ

i

= θ

i

θ

i

= ξ

i

∂θ

i

∂ξ

i

= 1

lower-bounded ξ

i

= log (θ

i

− l) θ

i

= exp (ξ

i

) + l

∂θ

i

∂ξ

i

= exp (ξ

i

)

upper-bounded ξ

i

= log (u −θ

i

) θ

i

= u − exp (ξ

i

)

∂θ

i

∂ξ

i

= exp (ξ

i

)

double-bounded ξ

i

= Φ

−1

θ

i

−l

u−l

θ

i

= (u − l) Φ (ξ

i

) + l

∂θ

i

∂ξ

i

= (u − l) φ (ξ

i

)

apply a transformation no new samples are required; instead the original samples can simply

be transformed using the function f .

In principle, users can select transformations themselves. Nevertheless, the bridgesampling

package comes with a set of built-in transformations (see Table 1), allowing the user to work

with the model in a familiar parameterization. When the user then supplies a named vector

with lower and upper bounds for the parameters (arguments lb and ub, respectively), the

package internally transforms the relevant parameters and adjusts the expressions by the

Jacobian term. Furthermore, as will be elaborated upon below, when the model is ﬁtted in

Stan, the bridgesampling package takes advantage of the rich class of Stan transformations.

The transformations built into the bridgesampling package are useful whenever each com-

ponent of the parameter vector can be transformed separately.

6

In this scenario, there are

four possible cases per parameter: (a) the parameter is unbounded; (b) the parameter has

a lower bound (e.g., variance parameters); (c) the parameter has an upper bound; and (d)

the parameter has a lower and an upper bound (e.g., probability parameters). As shown in

Table 1, in case (a) the identity (i.e., no) transformation is applied. In case (b) and (c), loga-

rithmic transformations are applied to transform the parameter to the real line. In case (d) a

probit transformation is applied. Note that internally, the posterior density is automatically

adjusted by the relevant Jacobian term. Since each component is transformed separately,

the resulting Jacobian matrix will be diagonal. This is convenient since it implies that the

absolute value of the determinant is the product of the absolute values of the diagonal entries

of the Jacobian matrix:

det

J

f

−1

(ξ)

=

p

Y

i=1

∂θ

i

∂ξ

i

. (8)

Once all posterior samples have been transformed to the real line, a multivariate normal

distribution is ﬁtted using method-of-moments. On a side note, bridge sampling may under-

estimate the marginal likelihood when the same posterior samples are used both for ﬁtting

the proposal distribution and for the iterative updating scheme (i.e., Equation (6)). Hence,

6

Thanks to a recent pull request by Kees Mulder, the bridgesampling package now also supports a more

complicated case in which multiple parameters are constrained jointly (i.e., simplex parameters). This pull

request also added support for circular parameters.

8 bridgesampling

as recommended by Overstall and Forster (2010), the bridgesampling package divides each

MCMC chain into two halves, using the ﬁrst half for ﬁtting the proposal distribution and the

second half for the iterative updating scheme.

2.2. Option II: warping the posterior distribution

The second choice for the proposal distribution that is implemented in the bridgesampling

package is a standard multivariate normal distribution in combination with a warped posterior

distribution. The goal is still to match the posterior and the proposal distribution as closely

as possible. However, instead of manipulating the proposal distribution, it is ﬁxed to a

standard multivariate normal distribution, and the posterior distribution is manipulated (i.e.,

warped). Crucially, the warped posterior density retains the normalizing constant of the

original posterior density. The general methodology is referred to as Warp bridge sampling

(Meng and Schilling 2002).

There exist several variants of Warp bridge sampling; in the bridgesampling package, we

implemented Warp-III bridge sampling (Meng and Schilling 2002; Overstall 2010; Gronau

et al. 2017c) which can be used by setting method = "warp3". This version matches the ﬁrst

three moments of the posterior and the proposal distribution. That is, in contrast to the

simpler normal method described above, Warp-III not only matches the mean vector and the

covariance matrix of the two distributions, but also the skewness. Consequently, when the

posterior distribution is skewed, Warp-III may result in an estimator that is less variable.

When the posterior distribution is symmetric, both Warp-III and the normal method should

yield estimators that are about equally eﬃcient. Hence, in principle, Warp-III should always

provide estimates that are at least as precise as the normal method. However, the Warp-III

method also takes about twice as much time to execute as the normal method; the reason for

this is that Warp-III sampling results in a mixture density (for details, see Overstall 2010;

Gronau et al. 2017c) which requires that the unnormalized posterior density is evaluated twice

as often as in the normal method.

Figure 2 illustrates the intuition for the warping procedure in the univariate case. The gray

histogram in the top-left panel depicts skewed posterior samples, the solid black line the

standard normal proposal distribution. The Warp-III procedure eﬀectively standardizes the

posterior samples so that they have mean zero (top-right panel) and variance one (bottom-

right panel), and then attaches a minus sign with probability 0.5 to the samples which achieves

symmetry (bottom-left panel). This intuition naturally generalizes to the multivariate case.

Starting with posterior samples that can range across the entire real line (i.e., ξ) the multi-

variate Warp-III procedure is based on the following stochastic transformation:

η = b

|{z}

symmetry

× R

−1

|{z}

covariance I

× (ξ − µ)

| {z }

mean 0

, (9)

where b ∼ B(0.5) on {−1, 1} and µ corresponds to the expected value of ξ (i.e., the mean

vector).

7

The matrix R is obtained via the Cholesky decomposition of the covariance matrix

of ξ, denoted as Σ, hence, Σ = RR

>

. Bridge sampling is then applied using this warped

posterior distribution in combination with a standard multivariate normal distribution.

7

B(θ) denotes a Bernoulli distribution with success probability θ.

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 9

Warp-0

-10 -5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

Density

mean = 0

Warp-I

-10 -5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

Density

variance = 1

Warp-II

-10 -5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

Density

symmetry

Warp-III

-10 -5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

Density

Figure 2: Illustration of the warping procedure. The black solid line shows the standard

normal proposal distribution and the gray histogram shows the posterior samples. Avail-

able at https://tinyurl.com/y7owvsz3 under CC license https://creativecommons.org/

licenses/by/2.0/ (see also Gronau et al. 2017c, 2018).

2.3. Estimation error

Once the marginal likelihood has been estimated, the user can obtain an estimate of the es-

timation error in a number of diﬀerent ways. One method is to use the error_measures

function which is an S3 generic. Note that the summary method for objects returned

by bridge_sampler internally calls the error_measures function and thus provides a

convenient summary of the estimated log marginal likelihood and the estimation uncer-

tainty. For marginal likelihoods estimated with the "normal" method and repetitions

= 1, the error_measures function provides an approximate relative mean-squared error of

the marginal likelihood estimate, an approximate coeﬃcient of variation, and an approximate

percentage error. The relative mean-squared error of the marginal likelihood estimate is given

by:

RE

2

=

E

h

ˆp(y) − p(y)

2

i

p(y)

2

. (10)

The bridgesampling package computes an approximate relative mean-squared error of the

marginal likelihood estimate based on the derivation by Fr

¨

uhwirth–Schnatter (2004) which

takes into account that the samples from the proposal distribution are independent, whereas

10 bridgesampling

the samples from the posterior distribution may be autocorrelated (e.g., when using MCMC

sampling procedures).

Under the assumption that the bridge sampling estimator ˆp(y) is unbiased, the square root of

the expected relative mean-squared error (Equation (10)) can be interpreted as the coeﬃcient

of variation (i.e., the ratio of the standard deviation and the mean). To facilitate interpreta-

tion, the bridgesampling package also provides a percentage error which is obtained by simply

converting the coeﬃcient of variation to a percentage.

Note that the error_measures function can currently not be used to obtain approximate

errors for the "warp3" method with repetitions = 1. The reason is that, in our experience,

the approximate errors appear to be unreliable in this case.

There are two further methods for assessing the uncertainty of the marginal likelihood esti-

mate. These methods are computationally more costly than computing approximate errors,

but are available for both the "normal" method and the "warp3" method. The ﬁrst option is

to set the repetitions argument of the bridge_sampler function to an integer larger than

one. This allows the user to obtain an empirical estimate of the variability across repeated

applications of the method. Applying the error_measures function to the output of the

bridge_sampler function that has been obtained with repetitions set to an integer large

than one provides the user with the minimum/maximum log marginal likelihood estimate

across repetitions and the interquartile range of the log marginal likelihood estimates. Note

that this procedure assesses the uncertainty of the estimate conditional on the posterior sam-

ples, that is, in each repetition new samples are drawn from the proposal distribution, but

the posterior samples are ﬁxed across repetitions.

In case the user is able to easily draw new samples from the posterior distribution, the

second option is to repeatedly call the bridge_sampler function, each time with new posterior

samples. This way, the user obtains an empirical assessment of the variability of the estimate

which takes into account both uncertainty with respect to the samples from the proposal and

also from the posterior distribution. If computationally feasible, we recommend this method

for assessing the estimation error of the marginal likelihood.

After having outlined the underlying bridge sampling algorithm, we next demonstrate the

capabilities of the bridgesampling package using three examples. Additional examples are

available as vignettes at: https://cran.r-project.org/package=bridgesampling

3. Toy example: Bayesian t-test

We start with a simple statistical example: a Bayesian paired-samples t-test (Jeﬀreys 1961;

Rouder et al. 2009; Ly et al. 2016; Gronau et al. 2017a). We use R’s sleep data set (Cushny

and Peebles 1905) which contains measurements for the eﬀect of two soporiﬁc drugs on ten

patients. Two diﬀerent drugs where administered to the same ten patients and the dependent

measure was the average number of hours of sleep gained compared to a control night in

which no drug was administered. Figure 3 shows the increase in sleep (in hours) of the ten

patients for each of the two drugs. To test whether the two drugs diﬀer in eﬀectiveness, we

can conduct a Bayesian paired-samples t-test.

The null hypothesis H

0

states that the n diﬀerence scores d

i

, i = 1, 2, . . . , n, where n = 10,

follow a normal distribution with mean zero and variance σ

2

, that is, d

i

∼ N(0, σ

2

). The

alternative hypothesis H

1

states that the diﬀerence scores follow a normal distribution with

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 11

1 2

-2

0

2

4

6

Increase in Sleep (Hours)

Drug

Figure 3: The sleep data set (Cushny and Peebles 1905). The left violin plot displays the

distribution of the increase in sleep (in hours) of the ten patients for the ﬁrst drug, the right

violin plot displays the distribution of the increase in sleep (in hours) of the ten patients for

the second drug. Boxplots and the individual observations are superimposed. Observations

for the same participant are connected by a line. Figure available at https://tinyurl.com/

yalskr23 under CC license https://creativecommons.org/licenses/by/2.0/.

mean µ = σδ, where δ denotes the standardized eﬀect size, and variance σ

2

, that is, d

i

∼

N(σδ, σ

2

). Jeﬀreys’s prior is assigned to the variance σ

2

so that p(σ

2

) ∝ 1/σ

2

and a zero-

centered Cauchy prior with scale parameter r = 1/

√

2 is assigned to the standardized eﬀect

size δ (for details, see Rouder et al. 2009; Ly et al. 2016; Morey and Rouder 2015).

In this example, we are interested in computing the Bayes factor BF

10

which quantiﬁes how

much more likely the data are under H

1

(i.e., there is a diﬀerence between the two drugs)

than under H

0

(i.e., there is no diﬀerence between the two drugs) by using the bridgesampling

package. For this example, the Bayes factor can also be easily computed using the BayesFactor

package (Morey and Rouder 2015), allowing us to compare the results from the bridgesampling

package to the correct answer.

The ﬁrst step is to obtain posterior samples. In this example, we use JAGS in order to sample

from the models. Here we focus on how to compute the log marginal likelihood for H

1

. The

steps for obtaining the log marginal likelihood for H

0

are analogous. After having speciﬁed

the model corresponding to H

1

as the character string code_H1, posterior samples can be

12 bridgesampling

obtained using the R2jags package (Su and Yajima 2015) as follows:

8

R> library("R2jags")

R> data("sleep")

R> y <- sleep$extra[sleep$group == 1]

R> x <- sleep$extra[sleep$group == 2]

R> d <- x - y # compute difference scores

R> n <- length(d)

R> set.seed(1)

R> jags_H1 <- jags(data = list(d = d, n = n, r = 1 / sqrt(2)),

+ parameters.to.save = c("delta", "inv_sigma2"),

+ model.file = textConnection(code_H1), n.chains = 3,

+ n.iter = 16000, n.burnin = 1000, n.thin = 1)

Note the relatively large number of posterior samples; reliable estimates for the quantities

of interest in testing usually necessitate many more posterior samples than are required for

estimation. As a rule of thumb, we suggest that testing requires about an order of magnitude

more posterior samples than estimation.

Next, we need to specify a function that take as input a named vector with parameter values

and a data object, and returns the log of the unnormalized posterior density (i.e., the log

of the integrand in Equation (4)). This function is easily speciﬁed by inspecting the JAGS

model. As a heuristic, one only needs to consider the model code where a “∼” sign appears.

The log of the densities on the right-hand side of these “∼” symbols needs to be evaluated for

the relevant quantities and then these log density values are summed.

9

Using this heuristic,

we obtain the following unnormalized log posterior density function for H

1

:

R> log_posterior_H1 <- function(pars, data) {

+ delta <- pars["delta"] # extract parameter

+ inv_sigma2 <- pars["inv_sigma2"] # extract parameter

+ sigma <- 1 / sqrt(inv_sigma2) # convert precision to sigma

+ out <-

+ dcauchy(delta, scale = data$r, log = TRUE) + # prior

+ dgamma(inv_sigma2, 0.0001, 0.0001, log = TRUE) + # prior

+ sum(dnorm(data$d, sigma * delta, sigma, log = TRUE)) # likelihood

+ return(out)

+ }

The ﬁnal step before we can compute the log marginal likelihoods is to specify named vectors

with the parameter bounds:

R> lb_H1 <- rep(-Inf, 2)

R> ub_H1 <- rep(Inf, 2)

R> names(lb_H1) <- names(ub_H1) <- c("delta", "inv_sigma2")

R> lb_H1[["inv_sigma2"]] <- 0

8

The complete code (including the JAGS models and the code for H

0

) can be found in the supplemental

material and also on the Open Science Framework: https://osf.io/3yc8q/.

9

This heuristic assumes that the model does not include other random quantities that are generated during

sampling, such as posterior predictives.

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 13

The log marginal likelihood for H

1

can then be obtained by calling the bridge_sampler

function as follows:

R> library("bridgesampling")

R> set.seed(12345)

R> bridge_H1 <- bridge_sampler(samples = jags_H1,

+ log_posterior = log_posterior_H1,

+ data = list(d = d, n = n, r = 1 / sqrt(2)),

+ lb = lb_H1, ub = ub_H1)

We obtain:

R> print(bridge_H1)

Bridge sampling estimate of the log marginal likelihood: -27.17103

Estimate obtained in 5 iteration(s) via method "normal".

Note that by default, the "normal" bridge sampling method is used.

Next, we can use the error_measures function to obtain an approximate percentage error of

the estimate:

R> error_measures(bridge_H1)$percentage

[1] "0.087%"

The small approximate percentage error indicates that the marginal likelihood has been esti-

mated reliably. As mentioned before, we can use the summary method to obtain a convenient

summary of the bridge sampling estimate and the estimation error. We obtain:

R> summary(bridge_H1)

Bridge sampling log marginal likelihood estimate

(method = "normal", repetitions = 1):

-27.17103

Error Measures:

Relative Mean-Squared Error: 7.564225e-07

Coefficient of Variation: 0.0008697255

Percentage Error: 0.087%

Note:

All error measures are approximate.

After having computed the log marginal likelihood estimate for H

0

in a similar fashion, we

can compute the Bayes factor for H

1

over H

0

using the bf function:

14 bridgesampling

R> bf(bridge_H1, bridge_H0)

Estimated Bayes factor in favor of bridge_H1 over bridge_H0: 17.26001

Hence, the observed data are about 17 times more likely under H

1

(which assigns the stan-

dardized eﬀect size δ a zero-centered Cauchy prior with scale r = 1/

√

2) than under H

0

(which ﬁxes δ to zero). This is strong evidence for a diﬀerence in eﬀectiveness between the

two drugs (Jeﬀreys 1939, Appendix I). The estimated Bayes factor closely matches the Bayes

factor obtained with the BayesFactor package (i.e., BF

10

= 17.259).

4. A “black box” Stan interface

The previous section demonstrated how the bridgesampling package can be used to estimate

the marginal likelihood for models coded in JAGS. For custom samplers, the steps needed to

compute the marginal likelihood are the same. What is required is (a) an object with posterior

samples; (b) a function that computes the log of the unnormalized posterior density; (c) the

data; and (d) parameter bounds. A crucial step is the speciﬁcation of the unnormalized log

posterior density function. For applied researchers, this step may be challenging and error-

prone, whereas for experienced statisticians it might be tedious and cumbersome, especially

for complex models with a hierarchical structure.

In order to facilitate the computation of the marginal likelihood even further, the bridgesam-

pling package contains an interface to the generic sampling software Stan (Carpenter et al.

2017). Assisted by the rstan package (Stan Development Team 2016a), this interface allows

users to skip steps (b)-(d) above. Speciﬁcally, users who ﬁt their models in Stan (in a way that

retains the constants, as is detailed below) can obtain an estimate of the marginal likelihood

by simply passing the stanfit object to the bridge_sampler function.

The implementation of this “black box” functionality proﬁted from the fact that, just as the

bridgesampling package, Stan’s No-U-Turn sampler internally operates on unconstrained pa-

rameters (Hoﬀman and Gelman 2014; Stan Development Team 2017). The rstan package

provides access to these unconstrained parameters and the corresponding log of the unnor-

malized posterior density. This means that users can ﬁt models with parameter types that

have more complicated constraints than those currently built into bridgesampling (e.g., co-

variance/correlation matrices) without having to hand-code the appropriate transformations.

As mentioned above, in order to use the bridgesampling package in combination with Stan

the models need to be implemented in a way that retains the constants. This can be

achieved relatively easily: instead of writing, for instance, y ~ normal(mu, sigma) or y

~ bernoulli(theta), one needs to write

target += normal_lpdf(y | mu, sigma);

and

target += bernoulli_lpmf(y | theta);

That is, one starts with the ﬁxed expression target += which is then followed by the name

of the distribution (e.g., normal). The name of the distribution is followed by _lpdf for

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 15

continuous distributions and _lpmf for discrete distributions. Finally, in parentheses, there

is the variable that was to the left of the “~” sign (here, y), then a “|” sign, and ﬁnally the

arguments of the distribution. This achieves that the user speciﬁes the log target density (in

this case, the log of the unnormalized posterior density) in a way that retains the constants

of the involved distributions.

Note that in case the distributions are truncated, the user needs to code the correct renormal-

ization. For instance, a normal distribution with upper truncation at upper is implemented

as follows

target += normal_lpdf(y | mu, sigma) - normal_lcdf(upper | mu, sigma);

where the function normal_lcdf yields the log of the cumulative distribution function (cdf)

of the normal distribution. Likewise, a normal distribution with lower truncation at lower is

obtained as

target += normal_lpdf(y | mu, sigma) - normal_lccdf(lower | mu, sigma);

where normal_lccdf yields the log of the complementary cumulative distribution function

(ccdf) of the normal distribution (i.e., the log of one minus the cumulative distribution function

of the normal distribution). A normal distribution with lower truncation point lower and

upper truncation point upper can be implemented as follows:

target += normal_lpdf(y | mu, sigma) -

log_diff_exp(normal_lcdf(upper | mu, sigma),

normal_lcdf(lower | mu, sigma));

where log_diff_exp(a, b) is a numerically more stable version of the operation

log (exp (a) − exp (b)). Note that when implementing a truncated distribution, it is of course

also important to give the variable of interest the correct bounds. For instance, for the last

example where y has a lower truncation at lower and an upper truncation at upper the

variable y should be declared as

10

real<lower = lower, upper = upper> y;

For more details about how to implement truncated distributions in Stan we refer the user to

the Stan manual (Stan Development Team 2017, section 5.3, “Truncated Distributions”).

In sum, the bridgesampling package enables users to obtain an estimate of the marginal

likelihood for any Stan model (programmed to retain the constants) simply by passing the

stanfit object to the bridge_sampler function. Next we demonstrate this functionality

using two prototypical examples in Bayesian model selection.

10

Note that we assumed that y is a scalar. In general, y could also be declared as a vector or an array in

Stan. In this case, the term that is subtracted for renormalization would need to be multiplied by the number

of elements of y. For example, for the case of an upper truncation and a vector y of length k the code would

need to be changed to: target += normal_lpdf(y | mu, sigma) - k * normal_lcdf(upper | mu, sigma);

For another example, see the code for “Stan example 2”.

16 bridgesampling

0 5 10 15 20 25 30 35

Clutch Identifier

3

4

5

6

7

8

9

Birth Weight (Grams)

Figure 4: Data for 244 newborn turtles (Janzen et al. 2000). Birth weight is plotted against

clutch membership. The clutches have been ordered according to their mean birth weight.

Dots indicate turtles who survived and red crosses indicate turtles who died. Figure inspired

by Sinharay and Stern (2005). Figure available at https://tinyurl.com/yagfxrbw under

CC license https://creativecommons.org/licenses/by/2.0/.

4.1. Stan example 1: Bayesian GLMM

The ﬁrst example features a generalized linear mixed model (GLMM) applied to the turtles

data set (Janzen et al. 2000).

11

This data set is included in the bridgesampling package and

contains information about 244 newborn turtles from 31 diﬀerent clutches. For each turtle,

the data set includes information about survival status (0 = died, 1 = survived), birth weight

in grams, and clutch (family) membership (indicated by a number between one and 31).

Figure 4 displays a scatterplot of clutch membership and birth weight. The clutches have

been ordered according to mean birth weight. Dots indicate turtles who survived and red

crosses indicate turtles who died. This data set has been analyzed in the context of Bayesian

model selection before, allowing us to compare the results from the bridgesampling package

to the results reported in the literature (e.g., Sinharay and Stern 2005; Overstall and Forster

2010).

Here we focus on the model comparison that was conducted in Sinharay and Stern (2005).

11

Data were obtained from Overstall and Forster (2010) and made available in the bridgesampling package

with permission from the original authors.

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 17

The data set was analyzed using a probit regression model of the form:

y

i

∼ B(Φ(α

0

+ α

1

x

i

+ b

clutch

i

)), i = 1, 2, . . . , N

b

j

∼ N(0, σ

2

), j = 1, 2, . . . , C,

(11)

where y

i

denotes the survival status of the i-th turtle (i.e., 0 = died, 1 = survived), x

i

denotes

the birth weight (in grams) of the i-th turtle, clutch

i

∈ {1, 2, . . . , C}, i = 1, 2, . . . , N, indicates

the clutch to which the i-th turtle belongs, C denotes the number of clutches, and b

clutch

i

denotes the random eﬀect for the clutch to which the i-th turtle belongs. Furthermore, Φ(·)

denotes the cumulative distribution function (cdf) of the normal distribution. Sinharay and

Stern (2005) investigated the question whether there is an eﬀect of clutch membership, that

is, they tested the null hypothesis H

0

: σ

2

= 0. The following priors where assigned to the

model parameters:

α

0

∼ N(0, 10),

α

1

∼ N(0, 10),

p(σ

2

) =

1 + σ

2

−2

.

(12)

Sinharay and Stern (2005) computed the Bayes factor in favor of the null hypothesis H

0

:

σ

2

= 0 versus the alternative hypothesis H

1

: p(σ

2

) =

1 + σ

2

−2

using diﬀerent methods

and they reported a “true” Bayes factor of BF

01

= 1.273 (based on extensive numerical

integration). Here we examine the extent to which we can reproduce the Bayes factor using

the bridgesampling package.

After having implemented the Stan models as character strings H0_code and H1_code, the

next step is to run Stan and obtain the posterior samples:

12

R> library("bridgesampling")

R> library("rstan")

R> data("turtles")

R> set.seed(1)

R> stanfit_H0 <- stan(model_code = H0_code,

+ data = list(y = turtles$y,

+ x = turtles$x, N = nrow(turtles)),

+ iter = 15500, warmup = 500,

+ chains = 4, seed = 1)

R> stanfit_H1 <- stan(model_code = H1_code,

+ data = list(y = turtles$y,

+ x = turtles$x, N = nrow(turtles),

+ C = max(turtles$clutch),

+ clutch = turtles$clutch),

+ iter = 15500, warmup = 500,

+ chains = 4, seed = 1)

12

The complete code can be found in the supplemental material, on the Open Science Framework (https:

//osf.io/3yc8q/), and is also available at ?turtles. Note that the results are dependent on the compiler and

the optimization settings. Thus, even with identical seeds results can diﬀer slightly from the ones reported

here.

18 bridgesampling

With these Stan objects in hand, estimates of the log marginal likelihoods are obtained by

simply passing the objects to the bridge_sampler function:

R> set.seed(1)

R> bridge_H0 <- bridge_sampler(stanfit_H0)

R> bridge_H1 <- bridge_sampler(stanfit_H1)

The Bayes factor in favor of H

0

over H

1

can then be obtained as follows:

R> bf(bridge_H0, bridge_H1)

Estimated Bayes factor in favor of bridge_H0 over bridge_H1: 1.27151

This value is close to that of 1.273 reported in Sinharay and Stern (2005). The data are only

slightly more likely under H

0

than under H

1

, suggesting that the data do not warrant strong

claims about whether or not clutch membership aﬀects survival.

The precision of the estimates for the marginal likelihoods can be obtained as follows:

R> error_measures(bridge_H0)$percentage

[1] "0.00972%"

R> error_measures(bridge_H1)$percentage

[1] "0.348%"

These error percentages indicate that both marginal likelihoods have been estimated accu-

rately, but – as expected – the marginal likelihood for the more complicated model with

random eﬀects (i.e., H

1

) has the larger estimation error.

4.2. Stan example 2: Bayesian factor analysis

The second example concerns Bayesian factor analysis. In particular, we determine the num-

ber of relevant latent factors by implementing the Bayesian factor analysis model proposed

by Lopes and West (2004). The model assumes that there are t, t = 1, 2, . . . , T , observations

on each of m variables. That is, each observation y

t

is an m-dimensional vector. The k-factor

model – where k denotes the number of factors – relates each of the T observations y

t

to

a latent k-dimensional vector f

t

which contains for observation t the values on the latent

factors, as follows:

13

y

t

| f

t

∼ N

m

(βf

t

, Σ)

f

t

∼ N

k

(0

k

, I

k

) ,

(13)

where β denotes the m × k factor loadings matrix

14

, Σ = diag

σ

2

1

, σ

2

2

, . . . , σ

2

m

denotes the

m ×m diagonal matrix with residual variances, 0

k

denotes a k-dimensional vector with zeros,

13

Note that the model assumes that the observations are zero-centered.

14

We use the original notation by Lopes and West (2004) who denoted the factor loadings matrix with a

lower-case letter. In the remainder of the article, matrices are denoted by upper-case letters.

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 19

and I

k

denotes the k × k identity matrix. Hence, conditional on the latent factors, the

observations on the m variables are assumed to be uncorrelated with each other. Marginally,

however, the observations are usually not uncorrelated and they are distributed as

y

t

∼ N

m

(0

m

, Ω) , (14)

where Ω = ββ

>

+ Σ.

Here we reanalyze a data set that contains the changes in monthly international exchange rates

for pounds sterling from January 1975 to December 1986 (West and Harrison 1997, pp. 612-

615). Currencies tracked are US Dollar (US), Canadian Dollar (CAN), Japanese Yen (JAP),

French Franc (FRA), Italian Lira (ITA), and the (West) German Mark (GER). Figure 5

displays the data.

15

Using diﬀerent computational methods, including bridge sampling, Lopes

and West (2004) estimated the marginal likelihoods and posterior model probabilities for a

factor model with one, two, and three factors. As before, this allows us to compare the

results from the bridgesampling package to the results reported in the literature. To identify

the model, the factor loading matrix β is constrained to be lower-triangular (Lopes and West

2004). The diagonal elements of β are constrained to be positive by assigning them standard

half-normal priors with lower truncation point zero: β

jj

∼ N(0, 1)

T (0,)

, j = 1, 2, . . . , k, and

the lower-diagonal elements are assigned standard normal priors. The residual variances are

assigned inverse-gamma priors of the form σ

2

i

∼ Inverse-Gamma(ν/2, νs

2

/2), i = 1, 2, . . . , m,

where ν = 2.2 and νs

2

= 0.1 (for details, see Lopes and West 2004).

The ﬁrst step in our reanalysis is to specify the Stan model as the character string model_code.

We can then ﬁt the three models corresponding to k = 1, k = 2, and k = 3 latent factors and

estimate the log marginal likelihoods using bridgesampling as follows:

16

R> library("rstan")

R> library("bridgesampling")

R> data("ier")

R> cores <- 4

R> options(mc.cores = cores) # for parallel MCMC chains

R> model <- stan_model(model_code = model_code) # compile model

R> set.seed(1)

R> stanfit <- bridge <- vector("list", 3)

R> for (k in 1:3) {

+ stanfit[[k]] <- sampling(model,

+ data = list(Y = ier, T = nrow(ier),

+ m = ncol(ier), k = k),

+ iter = 11000, warmup = 1000, chains = 4,

+ init = init_fun(nchains = 4, k = k,

+ m = ncol(ier)),

15

Each series has been standardized with respect to its sample mean and standard deviation. These stan-

dardized data are included in the bridgesampling package.

16

The complete code can be found in the supplemental material, on the Open Science Framework (https:

//osf.io/3yc8q/), and is also available at ?ier. Note that we specify initial values using a custom init_fun

function. This function may need to be changed for diﬀerent applications. Furthermore, it is strongly advised

to check that the chains have indeed converged since we sometimes encountered convergence issues with this

model. Note that the results are dependent on the compiler and the optimization settings. Thus, even with

identical seeds results can diﬀer slightly from the ones reported here.

20 bridgesampling

1975 1978 1981 1984 1987

-4

-2

0

2

4

Year

Exchange Rate Changes

US Dollar

1975 1978 1981 1984 1987

-4

-2

0

2

4

Year

Exchange Rate Changes

Canadian Dollar

1975 1978 1981 1984 1987

-4

-2

0

2

4

Year

Exchange Rate Changes

Yen

1975 1978 1981 1984 1987

-4

-2

0

2

4

Year

Exchange Rate Changes

Franc

1975 1978 1981 1984 1987

-4

-2

0

2

4

Year

Exchange Rate Changes

Lira

1975 1978 1981 1984 1987

-4

-2

0

2

4

Year

Exchange Rate Changes

Mark

Figure 5: Changes in monthly international exchange rates for pounds sterling from January

1975 to December 1986 (West and Harrison 1997, pp. 612 – 615). Currencies tracked are

US Dollar (US), Canadian Dollar (CAN), Japanese Yen (JAP), French Franc (FRA), Italian

Lira (ITA), and the (West) German Mark (GER). Each series has been standardized with

respect to its sample mean and standard deviation. Figure reproduced from Lopes and West

(2004). Figure available at https://tinyurl.com/ybtdddyv under CC license https://

creativecommons.org/licenses/by/2.0/.

+ cores = cores, seed = 1)

+ bridge[[k]] <- bridge_sampler(stanfit[[k]], method = "warp3",

+ repetitions = 10, cores = cores)

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 21

+ }

Note that in this example, we use the "warp3" method instead of the "normal" method.

Furthermore, since the error_measures function cannot be used when the estimate has been

obtained using method = "warp3" with repetitions = 1, we set repetitions = 10 to ob-

tain an empirical estimate of the estimation uncertainty (conditional on the posterior samples).

We also select parallel computation by setting cores = 4. The summary method provides a

convenient overview of the estimate and the estimation uncertainty. For instance, for the

2-factor model, we obtain as output:

R> summary(bridge[[2]])

Bridge sampling log marginal likelihood estimate

(method = "warp3", repetitions = 10):

-903.4522

Error Measures:

Min: -903.4565

Max: -903.4481

Interquartile Range: 0.002682305

Note:

All error measures are based on 10 estimates.

Table 2 displays for each of the three factor models (i.e., k = 1, k = 2, k = 3) the median

log marginal likelihood (logml) across repetitions, the minimum/maximum log marginal like-

lihood across repetitions, and the log marginal likelihood value reported in Lopes and West

(2004) based on bridge sampling. Note that the negative inﬁnity reported by Lopes and West

(2004) might be due to a numerical problem. For the 1-factor model and the 2-factor model,

the log marginal likelihoods obtained via bridgesampling are very similar to the ones reported

in Lopes and West (2004). Furthermore, the narrow range of the estimates indicates that the

estimation uncertainty is small (conditional on the posterior samples, as described above).

To examine the support for the three diﬀerent models (i.e., diﬀerent numbers of latent factors),

we can use the post_prob function to compute posterior model probabilities. By default, the

function assumes that all models are equally likely a priori; this can be adjusted using the

prior_prob argument. Furthermore, the model_names argument can optionally be used to

provide names for the models. Here we use the default of equal prior model probabilities and

we obtain:

R> post_prob(bridge[[1]], bridge[[2]], bridge[[3]],

+ model_names = c("k = 1", "k = 2", "k = 3"))

k = 1 k = 2 k = 3

[1,] 6.278942e-49 0.8435919 0.1564081

[2,] 6.309963e-49 0.8491811 0.1508189

22 bridgesampling

Table 2: Log marginal likelihood (logml) estimates for the k = 1, k = 2, and k = 3 factor

model. The rightmost column displays the values based on bridge sampling reported in Lopes

and West (2004).

Number of Factors Median Logml Min Logml Max Logml Lopes & West

k = 1 -1014.271 -1014.273 -1014.269 -1014.5

k = 2 -903.452 -903.457 -903.448 -903.7

k = 3 -905.271 -905.454 -905.138 −∞

[3,] 6.373407e-49 0.8554668 0.1445332

[4,] 6.511718e-49 0.8739641 0.1260359

[5,] 6.582895e-49 0.8805172 0.1194828

[6,] 6.384273e-49 0.8596401 0.1403599

[7,] 6.469723e-49 0.8736989 0.1263011

[8,] 6.403270e-49 0.8616183 0.1383817

[9,] 6.426132e-49 0.8635907 0.1364093

[10,] 6.417346e-49 0.8592737 0.1407263

Each row presents the posterior model probabilities based on one repetition of the bridge

sampling procedure for all three models (i.e., each row sums to one). Hence, there are as

many rows as repetitions.

17

The 2-factor model receives most support from the observed

data. This is in line with Lopes and West (2004), who also preferred the 2-factor model;

18

based on the factor loadings, they proposed the presence of a North American factor and a

European Union factor.

5. Discussion

This paper introduced bridgesampling, an R package for computing marginal likelihoods,

Bayes factors, posterior model probabilities, and normalizing constants in general. We have

demonstrated how researchers can use bridgesampling to conduct Bayesian model comparisons

in a generic, user-friendly way: researchers need only provide posterior samples, a function

that computes the log of the unnormalized posterior density, the data, and lower and upper

bounds for the parameters. Furthermore, we have described the Stan interface which makes it

even easier to obtain the marginal likelihood: researchers need only provide a stanfit object

and the bridgesampling package will automatically produce an estimate of the log marginal

likelihood.

19

In other words, the bridgesampling package makes it possible to obtain marginal

likelihood estimates for any model that can be implemented in Stan (in a way that retains the

17

Note that the output of the post_prob function can be directly passed to the boxplot function which

allows one to visualize the estimation uncertainty in the posterior model probabilities across repetitions.

18

Note that Lopes and West (2004) report a posterior model probability of 1 for the 2-factor model. However,

this estimate may be inﬂated by the inﬁnite log marginal likelihood value for the 3-factor model.

19

Similar to the stanfit method, the bridge_sampler method for nimble only requires the ﬁtted object (of

class MCMC_refClass) and extracts all necessary information for computing the marginal likelihood (including

the function for computing the unnormalized log posterior density and the parameter bounds). However, at

the time of writing we have not yet tested this method in the same intensity as the stanfit method. We will

add a vignette describing the nimble interface in more detail when we have done so.

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 23

constants). By combining the Stan state-of-the-art No-U-Turn sampler with bridgesampling,

researchers are provided with a general purpose, easy-to-use computational solution to the

challenging task of comparing complex Bayesian models.

As practical advice, we recommend to keep the following four points in mind when using

the bridgesampling package (see also Gronau et al. 2017c, 2018). First, one should always

check the posterior samples carefully. A successful application of bridge sampling requires a

suﬃcient number of representative samples from the posterior distribution. Thus, it is im-

portant to use eﬃcient sampling algorithms and, in case of MCMC sampling, it is crucial

that researchers conﬁrm that the chains have converged to the joint posterior distribution.

In addition, researchers need to make sure that the model does not contain any discrete pa-

rameters since those are currently not supported. This may sound more restrictive than it is.

In practice the solution is to marginalize out the discrete parameters, something that is often

possible. Note the similarity to Stan which also deals with discrete parameters by marginal-

izing them out (Stan Development Team 2017, section 15). Furthermore, as demonstrated

in the examples, for conducting model comparisons based on bridge sampling, the number of

posterior samples often needs to be an order of magnitude larger than for estimation. This

of course depends on a number of factors such as the complexity of the model of interest,

the number of posterior samples that one usually uses for estimation, the posterior sampling

algorithm used, and also the accuracy of the marginal likelihood estimate that one desires to

achieve.

Second, one should always assess the uncertainty of the bridge sampling estimate. In case the

uncertainty is deemed too high, one can attempt to achieve a higher precision by increasing

the number of posterior samples or, in case method = "normal", by using the more sophis-

ticated method = "warp3" instead (see the third point below). Users of the bridgesampling

package have diﬀerent options for assessing the estimation uncertainty. In our opinion, the

“gold standard” may be to obtain an empirical uncertainty assessment by repeating the bridge

sampling procedure multiple times, each time using a fresh set of posterior samples. This ap-

proach allows users to assess the uncertainty directly for the quantity of interest. For instance,

if the focus is on computing a Bayes factor, users may repeat the following steps: (a) obtain

posterior samples for both models, (b) use the bridge_sampler function to estimate the log

marginal likelihoods, (c) compute the Bayes factor using the bf function. The variability of

these Bayes factor estimates across repetitions then provides an assessment of the uncertainty.

For certain applications, this approach may be infeasible due to computational restrictions. If

this is the case and method = "normal", we recommend to use the approximate errors based

on Fr

¨

uhwirth–Schnatter (2004) which are available through the error_measures function.

As mentioned before, we have found these approximate errors to work well for method =

"normal", but not for method = "warp3" which is the reason why they are not available for

the latter method. Alternatively, one can also assess the estimation uncertainty by setting

the repetitions argument to an integer larger than one. This provides an assessment of the

estimation uncertainty due to variability in the samples from the proposal distribution, but

it should be kept in mind that this does not take into account variability in the posterior

samples.

Third, one should consider whether using the more time-consuming Warp-III method may

be beneﬁcial. The accuracy of the estimate is governed not only be the number of samples,

but also by the overlap between the posterior and the proposal distribution (e.g., Meng and

Wong 1996; Meng and Schilling 2002). The bridgesampling package attempts to maximize

24 bridgesampling

this overlap by (a) focusing on one marginal likelihood at a time which allows one to use

a convenient proposal distribution which closely resembles the posterior distribution, (b)

using a proposal distribution which matches the mean vector and covariance matrix of the

posterior samples (i.e., method = "normal") or additionally also the skewness (i.e., method =

"warp3"). Consequently, as mentioned before, method = "warp3" will always be as precise

or more precise than method = "normal"; however, it also takes about twice as long. We

have found that in many applications, method = "normal" works well, however, in case the

posterior is skewed (crucially, this refers to the joint posterior of the quantities that have

been transformed to the real line), method = "warp3" may be the better choice. When in

doubt, we believe that it may be beneﬁcial to also explore the Warp-III results – if this is

computationally feasible – to see how much (if any) improvement in precision is achieved by

taking into account potential skewness. It should be kept in mind that, in case the posterior

distribution exhibits multiple modes, the overlap of the two distributions may still be subject

to improvement – even when using method = "warp3". The development of eﬃcient bridge

sampling variants for these cases is subject to ongoing research (e.g., Wang and Meng 2016;

Fr

¨

uhwirth–Schnatter 2004).

Forth, users should carefully think about the choice of prior distribution. Even though the

bridgesampling package enables researchers to compute the marginal likelihoods in an almost

black-box manner, this does not imply that the user can mindlessly exploit the package func-

tionality to conduct Bayesian model comparisons. As is apparent from Equation (4), Bayesian

model comparisons depend on the choice of the parameter prior distribution. Crucially, the

prior distribution has a lasting inﬂuence on the results. Hence, meaningful Bayesian model

comparisons require that researchers carefully consider their parameter prior distribution (e.g.,

Lee and Vanpaemel 2018), engage in sensitivity analyses, or use default prior choices that have

certain desirable properties such as model selection and information consistency (e.g., Jeﬀreys

1961; Bayarri et al. 2012; Ly et al. 2016).

20

Thus, the bridgesampling package removes the

computational hurdle of obtaining the marginal likelihood, thereby allowing researchers to

spend more time and eﬀort on the speciﬁcation of meaningful prior distributions.

It should also be kept in mind that there may be cases in which the bridge sampling proce-

dure may not be the ideal choice for conducting Bayesian model comparisons. For instance,

when the models are nested it might be faster and easier to use the Savage-Dickey density

ratio (Dickey and Lientz 1970; Wagenmakers et al. 2010). Another example is when the com-

parison of interest concerns a very large model space, and a separate bridge sampling based

computation of marginal likelihoods may take too much time. In this scenario, Reversible

Jump MCMC (Green 1995) may be more appropriate. The downside of Reversible Jump

MCMC is that it is usually problem-speciﬁc and cannot easily be applied in a generic fashion

to diﬀerent nested and non-nested model comparison problems (but see Gelling et al. 2017).

The goal with the bridgesampling package, however, was exactly that: to provide users with

a generic way of computing marginal likelihoods which can in principle be applied to any

Bayesian model comparison problem.

In the future, we hope that it may be possible to add bridgesampling support for a number

of R packages, such as the MCMCglmm package (Hadﬁeld 2010), the JAGS interface of the

20

Note that in the ﬁrst example (i.e., the Bayesian t-test) we have used prior distributions which lead to

these desirable properties. However, in the second and third example, we simply used the prior distributions

that have been used in the literature so that we could compare our results to the reported results.

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 25

mgcv package (Wood 2016), the glmmBUGS package (Brown and Zhou 2018), or the blavaan

21

package (Merkle and Rosseel 2018) so that users could conduct Bayesian model comparisons in

a black box way similar to the Stan interface. For packages that use themselves Stan for ﬁtting

the models, adding bridgesampling support is relatively straightforward: the only potential

change that would have to be implemented is to make sure that the models are coded such that

all constants are retained (as explained in section 4). Once this is achieved, computing the

relevant quantities via bridgesampling works as described in the Stan examples. For packages

that do not use Stan to ﬁt the models, the main diﬃculty is specifying the unnormalized

posterior density function and the parameter bounds in an automatized way. This is also

the reason why there is currently no black box interface to JAGS since, to the best of our

knowledge, specifying these quantities in an automatized way is not trivial. Nevertheless, if

this hurdle could be overcome, adding bridgesampling support would be straightforward.

In sum, the bridgesampling package provides a generic, accurate, easy-to-use, automatic,

and fast way of computing marginal likelihoods and conducting Bayesian model comparisons.

With the computational challenge all but overcome, researchers can spend more time and

eﬀort on addressing the conceptual challenge that comes with Bayesian model comparisons:

specifying prior distributions that are either robust or meaningful.

6. Acknowledgements

This research was supported by a Netherlands Organisation for Scientiﬁc Research (NWO)

grant to QFG (406.16.528), by an NWO Vici grant to EJW (016.Vici.170.083), and by the

Berkeley Initiative for Transparency in the Social Sciences, a program of the Center for

Eﬀective Global Action (CEGA), with support from the Laura and John Arnold Foundation.

References

Bayarri MJ, Berger JO, Forte A, Garc´ıa-Donato G (2012). “Criteria for Bayesian Model

Choice With Application to Variable Selection.” The Annals of Statistics, 40, 1550–1577.

Brown PE, Zhou L (2018). glmmBUGS: Generalised Linear Mixed Models and Spatial

Models with WinBUGS, JAGS, and OpenBUGS. R package version 2.4.2, URL https://

CRAN.R-project.org/package=glmmBUGS.

Carlin BP, Chib S (1995). “Bayesian Model Choice via Markov Chain Monte Carlo Methods.”

Journal of the Royal Statistical Society B, 57, 473–484.

Carpenter B, Gelman A, Hoﬀman M, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo

J, Li P, Riddell A (2017). “Stan: A Probabilistic Programming Language.” Journal of

Statistical Software, 76, 1–32.

Chib S (1995). “Marginal Likelihood from the Gibbs Output.” Journal of the American

Statistical Association, 90, 1313–1321.

21

Note that the blavaan package already provides approximate marginal likelihoods for the models that are

obtained via a Laplace approximation.

26 bridgesampling

Cushny AR, Peebles AR (1905). “The Action of Optical Isomers: II Hyoscines.” The Journal

of Physiology, 32, 501–510.

de Valpine P, Turek D, Paciorek CJ, Anderson-Bergman C, Lang DT, Bodik R (2017).

“Programming with Models: Writing Statistical Algorithms for General Model Structures

with NIMBLE.” Journal of Computational and Graphical Statistics, 26, 403–413. doi:

10.1080/10618600.2016.1172487.

Denwood MJ (2016). “runjags: An R Package Providing Interface Utilities, Model Templates,

Parallel Computing Methods and Additional Distributions for MCMC Models in JAGS.”

Journal of Statistical Software, 71(9), 1–25. doi:10.18637/jss.v071.i09.

Dickey JM, Lientz BP (1970). “The Weighted Likelihood Ratio, Sharp Hypotheses about

Chances, the Order of a Markov Chain.” The Annals of Mathematical Statistics, 41, 214–

226.

Etz A, Wagenmakers EJ (2017). “J.B.S Haldane’s Contribution to the Bayes Factor Hypoth-

esis Test.” Statistical Science, 32, 313–329.

Fr

¨

uhwirth–Schnatter S (2004). “Estimating Marginal Likelihoods for Mixture and Markov

Switching Models Using Bridge Sampling Techniques.” Econometrics Journal, 7, 143–167.

Gamerman D, Lopes HF (2006). Markov Chain Monte Carlo: Stochastic Simulation for

Bayesian Inference. Chapman & Hall/CRC, Boca Raton, FL.

Gelling N, Schoﬁeld MR, Barker RJ (2017). rjmcmc: Reversible-Jump MCMC Using

Post-Processing. R package version 0.3.2, URL https://CRAN.R-project.org/package=

rjmcmc.

Gelman A, Meng XL (1998). “Simulating Normalizing Constants: From Importance Sampling

to Bridge Sampling to Path Sampling.” Statistical Science, 13, 163–185.

Green PJ (1995). “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian

Model Determination.” Biometrika, 82, 711–732.

Gronau QF, Heathcote A, Matzke D (2018). “Computing Bayes Factors for Evidence-

Accumulation Models Using Warp-III Bridge Sampling.” Manuscript submitted for pub-

lication and uploaded to PsyArXiv. URL https://psyarxiv.com/9g4et.

Gronau QF, Ly A, Wagenmakers EJ (2017a). “Informed Bayesian T -Tests.” Manuscript

submitted for publication and uploaded to arXiv. URL https://arxiv.org/abs/1704.

02479.

Gronau QF, Sarafoglou A, Matzke D, Ly A, Boehm U, Marsman M, Leslie DS, Forster JJ,

Wagenmakers EJ, Steingroever H (2017b). “A Tutorial on Bridge Sampling.” Journal of

Mathematical Psychology, 81, 80–97. URL https://doi.org/10.1016/j.jmp.2017.09.

005.

Gronau QF, Wagenmakers EJ, Heck DW, Matzke D (2017c). “A Simple Method for Compar-

ing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing

Tree Models Using Warp-III Bridge Sampling.” Manuscript submitted for publication and

uploaded to PsyArXiv. URL https://psyarxiv.com/yxhfm.

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 27

Hadﬁeld JD (2010). “MCMC Methods for Multi-Response Generalized Linear Mixed Models:

The MCMCglmm R Package.” Journal of Statistical Software, 33(2), 1–22. URL http:

//www.jstatsoft.org/v33/i02/.

Hankin RKS (2007). “Very Large Numbers in R: Introducing Package Brobdingnag.” R News,

7.

Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999). “Bayesian Model Averaging: A

Tutorial.” Statistical Science, 14, 382–417.

Hoﬀman MD, Gelman A (2014). “The No-U-Turn Sampler: Adaptively Setting Path Lengths

in Hamiltonian Monte Carlo.” Journal of Machine Learning Research, 15, 1593–1623.

Janzen FJ, Tucker JK, Paukstis GL (2000). “Experimental Analysis of an Early Life-History

Stage: Selection on Size of Hatchling Turtles.” Ecology, 81, 2290–2304.

Jeﬀerys WH, Berger JO (1992). “Ockham’s Razor and Bayesian Analysis.” American Scientist,

80, 64–72.

Jeﬀreys H (1939). Theory of Probability. 1st edition. Oxford University Press, Oxford, UK.

Jeﬀreys H (1961). Theory of Probability. 3rd edition. Oxford University Press, Oxford, UK.

Kass RE, Raftery AE (1995). “Bayes Factors.” Journal of the American Statistical Association,

90, 773–795.

Knaus J (2015). snowfall: Easier Cluster Computing (Based on snow). R package version

1.84-6.1, URL https://CRAN.R-project.org/package=snowfall.

Lartillot N, Philippe H (2006). “Computing Bayes Factors Using Thermodynamic Integra-

tion.” Systematic Biology, 55, 195–207.

Lee MD, Vanpaemel W (2018). “Determining Informative Priors for Cognitive Models.” Psy-

chonomic Bulletin & Review, 25, 114–127.

Lodewyckx T, Kim W, Lee MD, Tuerlinckx F, Kuppens P, Wagenmakers EJ (2011). “A Tuto-

rial on Bayes Factor Estimation with the Product Space Method.” Journal of Mathematical

Psychology, 55, 331–347.

Lopes HF, West M (2004). “Bayesian Model Assessment in Factor Analysis.” Statistica Sinica,

14, 41–67.

Ly A, Verhagen AJ, Wagenmakers EJ (2016). “Harold Jeﬀreys’s Default Bayes Factor Hy-

pothesis Tests: Explanation, Extension, and Application in Psychology.” Journal of Math-

ematical Psychology, 72, 19–32.

Meng XL, Schilling S (2002). “Warp Bridge Sampling.” Journal of Computational and Graph-

ical Statistics, 11, 552–586.

Meng XL, Wong WH (1996). “Simulating Ratios of Normalizing Constants via a Simple

Identity: A Theoretical Exploration.” Statistica Sinica, 6, 831–860.

28 bridgesampling

Merkle EC, Rosseel Y (2018). “blavaan: Bayesian Structural Equation Models via Parameter

Expansion.” Journal of Statistical Software, 85(4), 1–30. doi:10.18637/jss.v085.i04.

Morey RD, Rouder JN (2015). “BayesFactor 0.9.11-1.” Comprehensive R Archive Network.

URL http://cran.r-project.org/web/packages/BayesFactor/index.html.

Myung IJ, Pitt MA (1997). “Applying Occam’s Razor in Modeling Cognition: A Bayesian

Approach.” Psychonomic Bulletin & Review, 4, 79–95.

Overstall AM (2010). Default Bayesian Model Determination for Generalised Linear Mixed

Models. Ph.D. thesis, University of Southampton. URL https://eprints.soton.ac.uk/

170229/.

Overstall AM, Forster JJ (2010). “Default Bayesian Model Determination Methods for Gen-

eralised Linear Mixed Models.” Computational Statistics & Data Analysis, 54, 3269–3288.

Owen A, Zhou Y (2000). “Safe and Eﬀective Importance Sampling.” Journal of the American

Statistical Association, 95, 135 – 143.

Plummer M (2003). “JAGS: A Program for Analysis of Bayesian Graphical Models Using Gibbs

Sampling.” In K Hornik, F Leisch, A Zeileis (eds.), Proceedings of the 3rd International

Workshop on Distributed Statistical Computing. Vienna, Austria.

Plummer M (2016). rjags: Bayesian Graphical Models Using MCMC. R package version 4-6,

URL https://CRAN.R-project.org/package=rjags.

Plummer M, Best N, Cowles K, Vines K (2006). “coda: Convergence Diagnosis and Output

Analysis for MCMC.” R News, 6, 7–11.

R Core Team (2016). R: A Language and Environment for Statistical Computing. R Founda-

tion for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Stan Development Team (2016a). “rstan: The R interface to Stan.” R package version 2.14.1,

URL http://mc-stan.org/.

Stan Development Team (2016b). “rstanarm: Bayesian Applied Regression Modeling via

Stan.” R package version 2.13.1, URL http://mc-stan.org/.

Stan Development Team (2017). Stan Modeling Language Users Guide and Reference Manual,

Version 2.16.0. URL http://mc-stan.org.

Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G (2009). “Bayesian T Tests for

Accepting and Rejecting the Null Hypothesis.” Psychonomic Bulletin & Review, 16, 225–

237.

Sinharay S, Stern HS (2005). “An Empirical Comparison of Methods for Computing Bayes

Factors in Generalized Linear Mixed Models.” Journal of Computational and Graphical

Statistics, 14, 415–435.

Su YS, Yajima M (2015). R2jags: Using R to Run JAGS. R package version 0.5-7, URL

https://CRAN.R-project.org/package=R2jags.

Quentin F. Gronau, Henrik Singmann, Eric-Jan Wagenmakers 29

Vandekerckhove J, Matzke D, Wagenmakers EJ (2015). “Model Comparison and the Principle

of Parsimony.” In J Busemeyer, J Townsend, ZJ Wang, A Eidels (eds.), Oxford Handbook

of Computational and Mathematical Psychology, pp. 300–319. Oxford University Press.

Wagenmakers EJ, Lodewyckx T, Kuriyal H, Grasman R (2010). “Bayesian Hypothesis Testing

for Psychologists: A Tutorial on the Savage–Dickey Method.” Cognitive Psychology, 60,

158–189.

Wang L, Meng XL (2016). “Warp Bridge Sampling: The Next Generation.” arXiv preprint

arXiv:1609.07690.

West M, Harrison J (1997). Bayesian Forecasting and Dynamic Models. 2nd edition. Springer-

Verlag, New York.

Wood S (2016). “Just Another Gibbs Additive Modeler: Interfacing JAGS and mgcv.” Journal

of Statistical Softwares, 75(7), 1–15. ISSN 1548-7660. doi:10.18637/jss.v075.i07. URL

https://www.jstatsoft.org/v075/i07.

Aﬃliation:

Quentin F. Gronau

Department of Psychological Methods

University of Amsterdam

Nieuwe Achtergracht 129 B

1018 WT Amsterdam, The Netherlands

E-mail: [email protected]