3
Over 2,500 individuals participated in the study with approximately 80% taking all three
tests; all test centers had been instructed to administer all three tests to each test taker.
After data cleaning, the sample sizes ranged from 1,096 to 1,196 for individual forms.
Approximately 920 test takers took all three forms. To maximize the available data,
analyses were done with the individual test forms (Liu, Zhu, Chen, Wang, Lin, and Gao,
2017). In order to determine the association of Scale Scores with Level Scores, Scale Scores
were calculated for a large number of items from each assessment.
Standard Setting. ACT staff conducted a standard setting study for each assessment
with a panel of experts consisting of educators and business people, some of whom are
current WorkKeys customers. The purpose of the standard setting process is to gather
data to assist ACT in establishing the standards for achieving a defined performance level
on each of the NCRC assessments. The three skill assessments are criterion referenced
measures. Because of this, scores on the assessment are aligned to a set of skills that a
test taker has demonstrated. The goal of the standard setting process is to identify a
point on the score scale where test takers who score at or above the point have
demonstrated the ability to perform the skills, and test takers who score below the point
have not demonstrated the ability to perform the skills.
The Mapmark with Whole Booklet Feedback standard setting method was used in this
study. It is a variation of the popular Bookmark procedure. The primary difference
between Mapmark and Bookmark is the Item Map. The Ordered Item Booklet (OIB) has a
sample of items from the item pool ordered from easiest to hardest, but on the item map,
the difficulty of an item is mapped to an actual scale value. The item map, therefore,
shows “how much” more difficult one item is than another. In other words, the item map
provides additional information on item difficulty.
Mapmark with Whole Booklet Feedback is a three round process. This means that the
panelists set cut scores three times. In Round 1, the panelists 1) took each of the updated
assessments, 2) reviewed the performance level descriptors (PLDs) for each assessment
(PLDs indicate what individuals can do at each score level), 3) reviewed test items and
their associated Scale Score, 4) linked test items to the PLDs, and 5) placed bookmarks in
the OIB for each level. Specifically, the panelists were asked to divide the items for each
WorkKeys Skill Level into two groups—those that you feel are easy enough for a
minimally qualified examinee in the skill level to have mastered and those too difficult for
this expectation, where mastery is defined as having a 2-in-3 chance of success (or a
response probability of .67) on the item. This was done for the cut score between Below
Level 3 and Level 3, Levels 3 and 4, Levels 4 and 5, Levels 5 and 6, Levels 6 and 7, and Level
7 and Above Level 7.
In Round 2, the panelists received feedback regarding their bookmark placement in
terms of how it translated to a recommended Scale Score on the item map scale and
how it compared to the group’s median cut score.