Chapter 7
Sample Size Issues
for Conjoint Analysis
“I’m about to conduct a conjoint analysis study. How large a sample size do I
need? What will be the margin of error of my estimates if I use a sample of only
100 respondents?” These are common questions. Unfortunately, they are difficult
questions to answer because many issues come into play:
What is it exactly that you are trying to measure to get a statistically signifi-
cant result: a specific part-worth, preference for a product, or the difference
in preference between groups of people?
Do you expect that the differences between features/products/groups you
are trying to detect are subtle or strong?
What level of certainty do you need to be able to act upon your conclu-
sions: 99% confidence, 90% confidence, or what?
How large is the total population in the market for your product?
What conjoint methodology do you plan to use? How many conjoint ques-
tions will each respondent answer?
Do you need to compare subsets of respondents, or are you going to be
looking at results only as a whole?
How homogenous is your market? Do people tend to think alike, or are
there strong differences in preferences among individuals?
How do you plan to select your sample? Will it be a random sample or
convenience sample?
How large of a budget do you have for the project?
Answers to these questions play a role in determining the appropriate sample
size for a conjoint study. This chapter provides advice and tools to help conjoint
researchers make sample size decisions. It involves more statistical theory and
formulas than other chapters, so please bear with me.
57
Reprinted from Orme, B. (2010, 2019) Getting Started with Conjoint Analysis: Strategies for
Product Design and Pricing Research. Fourth Edition, Madison, Wis.: Research Publishers LLC.
© 2010, 2019 by Research Publishers LLC. No part of this work may be reproduced, stored in a
retrieval system, or transmitted in any form or by any means, mechanical, electronic, photocopying,
recording, or otherwise, without the prior written permission of the publisher.
58 Sample Size Issues for Conjoint Analysis
Though most of the principles that influence sample size determination are
based on statistics, successful researchers develop heuristics for quickly deter-
mining sample sizes based on experience, rules-of-thumb, and budget constraints.
Let us begin our discussion by making a distinction between sampling and mea-
surement error. Subsequent sections will discuss each of these sources of error.
7.1 Sampling Error versus Measurement Error
Errors are deviations from truth. In marketing research we are always concerned
with reducing error in cost-effective ways. Assuming that you have selected the
appropriate modeling method, there are two main sources of error that cause pref-
erence data to deviate from truth. The first is sampling error.
Sampling error occurs when samples of respondents deviate from the under-
lying population. If we have drawn a random sample (each population element
has an equal probability of being selected), sampling error is due to chance. If,
on the other hand, our sample is not random (for example, a convenience sam-
ple), the sampling errors may be systematic. With random sampling, we reduce
sampling error by simply increasing the sample size. With nonrandom sampling,
however, there is no guarantee that increasing sample size will make the samples
more representative of the population.
To illustrate sampling error, assume we wanted to figure out how far the av-
erage adult can throw a baseball. If we drew a random sample of thirty people,
and by chance happened to include Ichiro Suzuki (former professional baseball
outfielder), our estimate would likely be farther than the true distance for the av-
erage adult. It is important to note that the samples we use in marketing research
are rarely random. Some respondents resist being interviewed and, by selecting
themselves out of our study, are a source of nonresponse bias.
A second source of error in conjoint data is measurement error. We reduce
measurement error by having more or better data from each respondent. Consider
again the example of the baseball toss. Suppose you are one of the study partic-
ipants. You throw the ball, but you accidentally step into an uneven spot on the
ground, and the ball does not go as far as you typically could throw it. If we asked
you to take another few tosses, and averaged the results, we would reduce the
measurement error and get a better idea of how far you could throw a baseball.
In conjoint analysis, we reduce measurement error by including more conjoint
questions. We recognize, however, that respondents get tired, and there is a limit
beyond which we can no longer get reliable responses, and therefore a limit to the
amount we can reduce measurement error.
Copyright 2010, 2019 © Research Publishers LLC. All rights reserved.
7.2 Binary Variables and Proportions 59
7.2 Binary Variables and Proportions
Sampling error is expressed in terms of standard errors, confidence intervals, and
margins of error. We can begin to understand what these terms mean by con-
sidering binary variables and proportions. In fact, we will spend a good deal
of time talking about confidence intervals for proportions because the statistical
principles can be applied to choice-based conjoint results and shares of choice in
market simulations for all conjoint techniques.
A binary variable is a categorical variable with exactly two levels, such as a
yes/no item on a consumer survey or a true/false checklist item. Many product
attributes in conjoint studies have exactly two levels. And consumer choice itself
is binary—to choose or not, to buy or not. Binary variables are usually coded as
1 for yes and 0 for no. Looking across a set of binary variables, we see a set of 1s
and 0s. We can count the number of 1s, and we can compute the proportion of 1s,
which is the number of 1s divided by the sample size n.
In statistical theory, the sampling distribution of the proportion is obtained
by taking repeated random samples from the population and computing the pro-
portion for each sample. The standard error of the proportion is the standard
deviation of these proportions across the repeated samples. The standard error of
a proportion is given by the following formula:
standard error of a proportion =
r
pq
(n 1)
where p is the sample estimate of the proportion in the population, q = (1 p),
and n is the sample size.
Most of us are familiar with the practice of reporting the results of opinion
polls. Typically, a report may say something like this: “If the election were held
today, Mike Jackson is projected to capture 50 percent of the vote. The survey
was conducted by the XYZ company and has a margin of error of ±3 percent.
What is margin of error?
Margin of error refers to the upper and lower limits of a confidence interval.
If we use what is known as the normal approximation to the binomial, we can
obtain upper and lower limits of the 95% confidence interval for the proportion as
margin of error for a proportion = ±1.96
r
pq
(n 1)
Going back to the polling report from XYZ company, we note that margin of
error has a technical meaning in classical statistics. If XYZ were to repeat the poll
a large number of times (with a different random sample each time), 95 percent
of the confidence intervals associated with these samples would contain the true
proportion in the population. But, of course, 5 percent of the confidence intervals
would not contain the true proportion in the population. Confidence intervals are
random intervals. Their upper and lower limits vary from one sample to the next.
Copyright 2010, 2019 © Research Publishers LLC. All rights reserved.
60 Sample Size Issues for Conjoint Analysis
Suppose we interview 500 respondents (drawn using scientific random sam-
pling) and ask whether they approve of the president’s job performance, and sup-
pose 65 percent say yes. What would be the margin of error of this statistic? We
would compute the interval as follows:
±1.96
s
(0.65)(0.35)
(500 1)
= ±0.042
The margin of error is ±4.2 percent for a confidence interval from 60.8 to 69.2
percent. We expect 95 percent of the confidence intervals constructed in this way
to contain the true value of the population proportion.
Note that the standard error of the proportion varies with the size of the pop-
ulation proportion. So when there is agreement among people about a yes/no
question on a survey, the value of p is closer to one or zero, and the standard error
of the proportion is small. When there is disagreement, the value of p is closer to
0.50, and the standard error of the proportion is large. For any given sample size
n, the largest value for the standard error occurs when p = 0.50.
When computing confidence intervals for proportions, then, the most conser-
vative approach is to assume that the value of the population proportion is 0.50.
That is, for any given sample size and confidence interval type, p = 0.50 will
provide the largest standard error and the widest margin of error. Binary variables
and proportions have this special property—for any given sample size n and con-
fidence interval type, we know the maximum margin of error before we collect
the data. The same cannot be said for continuous variables, which we discuss in
the next section.
7.3 Continuous Variables and Means
With continuous variables (ratings-based responses to conjoint profiles), one can-
not estimate the standard error before fielding a study. The standard error of
the mean is directly related to the standard deviation of the continuous variable,
which differs from one study to the next and from one survey question to the next.
Assuming a normal distribution, the standard error of the mean is given by
standard error of the mean =
standard deviation
n
And the margin of error associated with a 95% confidence interval for the mean
is given by
margin of error for the mean = ±1.96
standard error of the mean
Copyright 2010, 2019 © Research Publishers LLC. All rights reserved.
7.4 Small Populations and the Finite Population Correction 61
Suppose we had conducted a ratings-based study with forty respondent in-
terviews. We want to estimate purchase likelihood for a client’s planned product
introduction with a margin of error of ±3 and a 95% confidence level. We run a
market simulation to estimate purchase likelihood on a 100-point scale, and the
simulator reports the standard error next to the purchase likelihood estimate:
Total Respondents = 40
Purchase Standard
Likelihood Error
Product A 78.34 3.06
The margin of error is ±1.96 × 3.06 = ±6.00, so we need to cut the margin of
error in half to achieve our ±3 target level of precision. We know that the standard
error of the mean is equal to the standard deviation divided by the square-root
of the sample size. To decrease the standard error by a factor of two, we must
increase sample size by a factor of four. Therefore, we need to interview about
40 × 4 = 160 or 120 additional respondents to obtain a margin of error of ±3 for
purchase likelihood.
7.4 Small Populations and the Finite Population Correction
The examples we have presented thus far have assumed infinite or very large
populations. But suppose that, instead of estimating the job performance rating of
the president by the United States population at large, we wanted to estimate (with
a margin of error of ±3 percent) the job performance rating of a school principal
by members of the PTA. Suppose there are only 100 members of the PTA. How
many PTA members do we need to interview to achieve a margin of error of ±3
percent for our estimate?
First, we introduce a new term: finite population correction. The formula
for the finite population correction is
(Nn)
(N1)
, where n is the sample size and N
is the population size. The formula for the finite population correction is often
simplified to (1 f), where f =
n
N
, which is approximately equivalent to
(Nn)
(N1)
for all except the smallest of populations.
After a population reaches about 5,000 individuals, one can generally ignore
the finite population correction factor because it has a very small impact on sam-
ple size decisions. Using the simplified finite population correction for a finite
sample, the margin of error for a proportion and a 95% confidence interval is
equal to
±1.96
r
(1 f)
pq
(n 1)
The finite population correction may also be used for continuous variables and
means.
Copyright 2010, 2019 © Research Publishers LLC. All rights reserved.
62 Sample Size Issues for Conjoint Analysis
With a population of 100, we can solve for n assuming an expected propor-
tion. The worst-case scenario (i.e., the one that has the largest standard error) is
for a 0.50 proportion, so it is standard to let p = 0.50. Solving for n, we discover
that we would need to interview 92 PTA members, or 92 percent of the population
to achieve a margin of error of ±3 percent.
The important point to be made is that with small populations, you may have
to interview a significant proportion of the population to achieve stable estimates.
Suppose your client produces a very expensive, highly specialized piece of ma-
chinery, for which there were only 100 total potential customers in the world.
Given many people’s unwillingness to complete surveys, it will likely be much
more difficult to complete surveys with 92 out of 100 potential buyers of this
product than to interview, say, 1,000 potential buyers of something like office
chairs, for which there are so many buyers as to approximate an infinite popula-
tion. Even so, in terms of estimating a proportion, both scenarios lead to the same
margin of error when projecting to the population of interest.
Conjoint studies may be used for large or small populations. We can use con-
joint analysis for even the smallest of populations, provided we interview enough
respondents to represent the population adequately.
7.5 Measurement Error in Conjoint Studies
Many researchers and dozens of data sets have demonstrated that conjoint util-
ities do a good job of predicting individual respondents’ preferences for prod-
ucts. Holdout choice sets (choice tasks not used to estimate utilities) are often
included in conjoint questionnaires. Using the conjoint data, a respondent’s hold-
out choices usually can be predicted with a hit rate of roughly 75 to 85 percent.
These choice tasks typically include between three and five different product con-
cepts, so by chance we expect a success rate between 20 and 33 percent.
The hit rates with conjoint are significantly greater than chance and signifi-
cantly better than the marketer’s best guesses—even if the marketer knows each
customer very well. In fact, conjoint predictions at the individual level frequently
approach or sometimes even exceed test-retest reliability, suggesting that a good
set of conjoint utilities is about as reliable at predicting choices to repeated hold-
out tasks as the respondents’ earlier choices.
If there were only one buyer of your product in the world, you could learn
a great deal about that individual’s preferences from a conjoint interview. The
utility data would be reasonably accurate for predicting his or her preferences
and weights placed upon attributes. We can learn a great deal about an individ-
ual respondent provided we ask that respondent the right questions and enough
questions. Let us consider numbers of conjoint questions or tasks needed for
alternative methods of conjoint analysis.
Copyright 2010, 2019 © Research Publishers LLC. All rights reserved.
7.5 Measurement Error in Conjoint Studies 63
Adaptive Conjoint Methods
Adaptive Conjoint Analysis (ACA) and Adaptive Choice-Based Conjoint (ACBC)
methods result in a sets of utilities for each individual. We want conjoint mea-
surements for each individual in the study to be as accurate as possible.
Of the conjoint methods discussed in this book, ACA and ACBC are perhaps
the best at reducing measurement error. These interviews adapt to the respondent,
asking questions designed to be relevant and efficient for refining utility estimates.
If your sample size is particularly small and the number of attributes to mea-
sure is large, ACA or ACBC may be better tools to use. In fact, it is possible
to have an entire research study designed to learn about the preferences of one
respondent, such as an important buyer of an expensive industrial product. As
we discussed in chapter 5, there are many considerations for determining whether
ACA or ACBC is appropriate for a study.
Traditional Conjoint Studies
Traditional full-profile conjoint (such as Sawtooth Software’s CVA or SPSS’s
conjoint module) usually leads to the estimation of individual-level part-worth
utilities. Again, the minimum sample size is one.
One should include enough conjoint questions or cards to reduce measure-
ment error sufficiently. Sawtooth Software’s CVA manual suggests asking enough
questions to obtain three times the number of observations as parameters to be es-
timated, or a number equal to 3(K k + 1), where K is the total number of levels
across all attributes and k is the number of attributes.
Respondents sometimes lack the energy or patience to answer many ques-
tions. We need to strike a good balance between overworking the respondent (and
getting noisy data) and not asking enough questions to stabilize the estimates.
Choice-Based Conjoint
Though generally considered more realistic than traditional conjoint, choice-based
questions are a relatively inefficient way to learn about preferences. As a result,
sample sizes are typically larger than with adaptive or traditional ratings-based
conjoint, and choice-based conjoint (CBC) results have traditionally been ana-
lyzed by aggregating respondents. Since the late 1990s, hierarchical Bayes has
permitted individual-level estimation of part-worth utilities from CBC data. But
to compute individual-level models, HB uses information from many respondents
to refine the utility estimates for each individual. Therefore, one usually does not
calculate utilities using a sample size of one. It should be noted, however, that
logit analysis can be run at the individual level, if the number of parameters to be
estimated is small, the design is highly efficient, and the number of tasks is large.
There are rules-of-thumb for determining sample sizes for CBC if we are will-
ing to assume aggregate estimation of effects. Like proportions, choices reflect
binary data, and the rules for computing confidence intervals for proportions are
well defined and known prior to collecting data.
Copyright 2010, 2019 © Research Publishers LLC. All rights reserved.
64 Sample Size Issues for Conjoint Analysis
Consider a design with three brands and three prices. Assume each person
completes ten tasks, and each task displays three products (i.e., each brand and
price occurs once per task). If we interview 100 respondents, each brand will have
been available for choice
(100 respondents) ×(10 tasks) ×
(3 concepts)
(3 brands)
= 1000 times
Johnson and Orme (1996) looked at about twenty commercial choice-based
conjoint data sets and determined that having each respondent complete ten tasks
is about as good at reducing error as having ten times as many respondents com-
plete one task. Of course, in the limit this suggestion is ridiculous. It does not
make sense to say that having one respondent complete 1,000 tasks is as good as
having 1,000 respondents complete one task. But, according to Johnson and Orme
(1996) simulation results, if a researcher obtains data from three to four hundred
respondents, doubling the number of tasks they complete is about as good (in
terms of reducing overall error) as doubling the sample size. It makes sense from
a cost-benefit standpoint, then, to have respondents complete many choice tasks.
Johnson, who is the author of Sawtooth Software’s CBC System, has recom-
mended a rule-of-thumb when determining minimum sample sizes for aggregate-
level full-profile CBC modeling: set
nta
c
500
where n is the number of respondents, t is the number of tasks, a is number of
alternatives per task (not including the none alternative), and c is the number of
analysis cells. When considering main effects, c is equal to the largest number of
levels for any one attribute. If you are also considering all two-way interactions,
c is equal to the largest product of levels of any two attributes (Johnson and Orme
2003).
Over the years, we have become concerned that practitioners use Johnson’s
rule-of-thumb to justify sample sizes that are too small. Some feel that they will
have ample stability in estimates when each main-effect level of interest is repre-
sented across the design about 500 times. But 500 was intended to be a minimum
threshold when researchers cannot afford to do better. It would be better, when
possible, to have 1,000 or more representations per main-effect level.
Copyright 2010, 2019 © Research Publishers LLC. All rights reserved.
7.6 Typical Sample Sizes and Practical Guidelines 65
7.6 Typical Sample Sizes and Practical Guidelines
The recommendations below assume infinite or very large populations. They are
based on the theories above and our observations of common practices in the
market research community:
Sample sizes for conjoint studies generally range from about 150 to 1,200
respondents.
If the purpose of your research is to compare groups of respondents and
detect significant differences, you should use a large enough sample size
to accommodate a minimum of about 200 per group. Therefore, if you
are conducting a segmentation study and plan to divide respondents into
as many as four groups (i.e., through cluster analysis) it would be wise
to include, at a minimum, 4 × 200 = 800 respondents. This, of course,
assumes your final group sizes will be about equal, so one would usually
want more data. The stronger segmentation studies include about 800 or
more respondents.
For robust quantitative research where one does not intend to compare sub-
groups, I would recommend at least 300 respondents. For investigational
work and developing hypotheses about a market, between thirty and sixty
respondents may do.
These suggestions have to be weighed against research costs. There are dif-
ficult decisions to be made based on experience, the application of statistical
principles, and sound judgment. If, after the fact, you find yourself question-
ing whether you really needed to have collected such a large sample size for a
particular project, it is an interesting exercise to delete a random subset of the
data (multiple times) to see how having fewer respondents would have affected
your findings across those replicates.
A thorough discussion of sampling and measurement errors would require
more time and many more pages. The reader is encouraged to consult other
sources in these areas. For statistics and sampling see Snedecor and Cochran
(1989) and Levy and Lemeshow (1999). For measurement theory see Nunnally
(1967).
Copyright 2010, 2019 © Research Publishers LLC. All rights reserved.