Graduate Theses, Dissertations, and Problem Reports
2010
Evaluating the Bender Visual Motor Gestalt Test II as a Diagnostic Evaluating the Bender Visual Motor Gestalt Test II as a Diagnostic
Screening Instrument Among Clinically Referred Children and Screening Instrument Among Clinically Referred Children and
Adolescents Adolescents
Linda R. Marnic
West Virginia University
Follow this and additional works at: https://researchrepository.wvu.edu/etd
Recommended Citation Recommended Citation
Marnic, Linda R., "Evaluating the Bender Visual Motor Gestalt Test II as a Diagnostic Screening Instrument
Among Clinically Referred Children and Adolescents" (2010).
Graduate Theses, Dissertations, and
Problem Reports
. 3162.
https://researchrepository.wvu.edu/etd/3162
This Dissertation is protected by copyright and/or related rights. It has been brought to you by the The Research
Repository @ WVU with permission from the rights-holder(s). You are free to use this Dissertation in any way that is
permitted by the copyright and related rights legislation that applies to your use. For other uses you must obtain
permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license
in the record and/ or on the work itself. This Dissertation has been accepted for inclusion in WVU Graduate Theses,
Dissertations, and Problem Reports collection by an authorized administrator of The Research Repository @ WVU.
For more information, please contact researchreposit[email protected].
Evaluating the Bender Visual Motor Gestalt Test II as a Diagnostic Screening
Instrument Among Clinically Referred Children and Adolescents
Linda R. Marnic
Dissertation submitted to the
College of Human Resources and Education
at West Virginia University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
In
Counseling Psychology
James Bartee, Ph.D., Chair
Jeffrey Daniels, Ph.D
Margaret Glenn, Ed.D.
Richard Walls, Ph.D.
Eric Youngstrom, Ph.D.
Department of Counseling, Rehabilitation Counseling,
and Counseling Psychology
Morgantown, West Virginia
2010
Key Words: Bender Gestalt II, Koppitz 2, KSADS-PL+, CBCL, Children
and Adolescents, Clinical Utility.
ABSTRACT
EVALUATING THE BENDER VISUAL MOTOR GESTALT TEST II AS A
DIAGNOSTIC SCREENING INSTRUMENT AMONG CLINICALLY
REFERRED CHILDREN AND ADOLESCENTS
LINDA R. MARNIC
This research was designed to investigate the diagnostic utility of the Bender Gestalt II
(BGII) test using the Bender Global Scoring System (BGSS) and the Koppitz 2 scoring
systems. The scores from these two systems were correlated with scores derived from the
Kiddie Schedule for Affective Disorders and Schizophrenia for School Aged Children,
Present and Lifetime Edition (KSADS-PL+), a semistructured interview along with
Longitudinal Evaluation of all Available Data (LEAD) and the Child Behavior Checklist
(CBCL). Of the 115 children and adolescents who initially participated in the study to
assess the validity of the Bender Visual Motor Gestalt II test as a screening instrument in
psychological decision making, 75 completed all protocols and the relevant data were
entered into the subsequent analysis. A correlational design was employed, and a post
hoc test was used to incorporate Receiver Operating Characteristic (ROC) analysis into
the findings. Results from both the Bender Gestalt II Global Scoring System and Koppitz
2 scoring systems showed moderate correlations with results from the CBCL on the
symptom categories of aggressive behaviors, depressed behaviors, and attention deficit
hyperactivity disorder (ADHD) in children and adolescents. However, the Koppitz 2
Emotional Indicators scoring measure did not accurately discriminate for the presence or
absence of psychopathology. The Koppitz 2 Total Error score was found to be modestly
correlated with receiving a diagnosis of ADHD. None of the other diagnoses based on
results from KSADS-PL+ with LEAD showed any significant correlations with the
Koppitz 2 Total Error Score. Adding Receiver Operating Characteristic analysis for
sensitivity and specificity, improved the diagnostic likelihood ratio from 50% to 66% for
ADHD diagnosis using the Koppitz 2 Emotional Indicators. The main hypothesis that the
Bender Gestalt II would improve diagnostic accuracy of psychopathology was not
supported. The unexpected finding that the BGII is useful in diagnosing ADHD indicates
a possible direction for future research.
iii
Acknowledgments
I would like to thank all the people who have been instrumental in the completion
of this dissertation. I would like to thank Eric Youngstrom, who believes in promoting
the field and cultivating the next generation of psychologists. His positive demeanor and
endless energy is a benefit not only to the areas of research and academia but to each
student under his tutelage. I cannot thank enough, Margalit Persing, Andrew Freeman,
and Dr. James Bartee for their editorial work and numerous reviews that made this paper
readable. Also, I am grateful to the whole Applewood family who made my years in
Cleveland truly wonderful. My inspiration has been the various married working mothers
in the field of psychology. They have shown me that it’s not impossible to achieve your
dreams, impact the lives of others, and still have enough left to be active, loving mothers
to wonderful children and partners to the men in their lives. I have found from their
examples, the more you give the more you have to give. Thank you.
I would like to acknowledge my family and friends. Stephen, my partner in this
life, your love and patience is my strength. You have given the most valuable gift in my
life: our children. My life is chaotic and difficult with work, kids, activities, and finding
time to write: however, you have stood by me, honored me, and that means so much.
Thank you.
I would like to thank all the clients I have seen over the years. Without them I
would have never started this endeavor and maintained the belief that I make a
difference. I know that I have learned from them as much as they from me. I realize that
iv
this paper is truly a gift from God, and I could not achieve anything without His help.
When I thought all was lost, and I couldn’t take another disappointment, I found strength
through prayer. I also thank the many who encouraged me to finish and were there for
me. Thank you. I am a better person for all the relationships I have had or will have in
my life. They have shown me the person I can be and what I am capable of doing. I thank
you all. I am truly blessed. If it were not for them then I would not have become me. I
am grateful and indebted to all of you.
v
Table of Contents
List of Tables....................................................................................................................viii
List of Figures.....................................................................................................................ix
Chapter 1: Introduction and Overview...............................................................................1
Statement of the Problem............................................................................................................9
Research Questions...................................................................................................................12
Research Significance................................................................................................................12
Definition of Terms...................................................................................................................14
Chapter 2: Review of Selected Literature.........................................................................21
Research Focus.................................................................................................................28
The Bender Visual Motor Gestalt Test.............................................................................30
Current Use of the Bender Visual Motor Gestalt–Second Edition............................................33
The Koppitz 2 and the Koppitz Developmental Scoring System..............................................35
Kiddie Schedule for Affective Disorders and Schizophrenia for School-Age Children...........37
The Child Behavior Checklist...................................................................................................41
Chapter 3: Method............................................................................................................45
Participants................................................................................................................................45

vi
Measures....................................................................................................................................46
The Bender Visual Motor Gestalt II- Global Scoring System.............................................................46
The Kiddie Schedule for Affective Disorders and Schizophrenia for School-Age Children (K-SADS).
.............................................................................................................................................................51
The Child Behavior Checklists, (CBCL) and Youth Self Report Form (YSR)...................................53
Examiners..................................................................................................................................54
Procedure...................................................................................................................................54
Data Analyses............................................................................................................................55
Chapter 4: Results.............................................................................................................59
Research Question 1..................................................................................................................59
Research Question 2..................................................................................................................68
Research Question 3..................................................................................................................71
Research Question 4..................................................................................................................75
Chapter 5: Discussion.......................................................................................................76
Discussion of Hypotheses..........................................................................................................76
Limitations of the Study............................................................................................................82
Implications for Practice............................................................................................................84
Conclusions...............................................................................................................................87
References.........................................................................................................................90
Appendix A: IRB Protocol..............................................................................................103
Appendix B: CREC Program Notice of Certification.....................................................108
vii
Appendix C: Administration Manual for ABACAB......................................................109
Appendix D: Demographic Form...................................................................................111
Appendix E: Consent Forms...........................................................................................115
Appendix F: Statistical Findings.....................................................................................120
Author’s Note..................................................................................................................121
viii
List of Tables
Table 1 Pearson Product-Moment Correlation Matrix Between the BGSS Copy and
Recall Scores and the CBCL Main Scores .............................................................. 62
Table 2 Pearson Correlational Matrix of the BGSS Scores and the CBCL Syndrome
Scores ....................................................................................................................... 64
Table 3 Pearson Correlational Matrix Comparing BGSS and CBCL Diagnostic
Scores ....................................................................................................................... 65
Table 4 Pearson Correlational Matrix Comparing CBCL and Koppitz 2 Scores ............ 66
Table 5 Pearson Correlation Matrix Comparing Koppitz 2 Total Score, Total Emotional
Indicators, and Visual Motor Index to the CBCL Syndrome Scales ....................... 67
Table 6 Pearson Correlation Matrix Comparing the BGII, BGSS and Koppitz2 Scores
and KSADS-PL+ Symptomology. ........................................................................... 70
Table 7 Pearson Product-Moment Correlations of the Koppitz 2 Scores and KSADS-PL+
Diagnoses ................................................................................................................. 70
ix
List of Figures
Figure 1. ROC of errors made using the Koppitz 2 scoring system to predict................. 74
UTILITY OF THE BGII 1
Chapter 1: Introduction and Overview
In an era of increased emphasis on accountability and outcomes, psychologists are
continually pushed to become more accurate in diagnosing and treating clients while
balancing the cost of providing services. The task of developing more efficacious
diagnostic and treatment protocols is indeed a challenging one in this time of managed
care and limited client contact in part due to insurance considerations and heavy
caseloads. Finding valid diagnostic measures that are inexpensive, easy to administer,
reliable, and that serve multiple purposes would be beneficial to all. This research was
used to investigate the use of the Bender Visual Motor Gestalt II (BGII) as a possible
diagnostic tool to add to clinical protocols that assist in initial diagnostic decision making
for children and adolescents.
The field of psychology is faced with many challenges. Mental-health statistics
for children in the United States are very disturbing. More than 400,000 children are in
therapy for treatment of a diagnosed mental illness (Kamphaus, Petoskey, & Rowe,
2000). Additionally, over 5 million children have received psychoeducational evaluations
in public schools to assess for learning, behavior, and information processing problems
(Kamphaus et al., 2000). Astoundingly, this number does not include those children who
were evaluated outside of the school system for various mental-health or neurological
problems. It has been reported by Tolan and Dodge (2005) that over two thirds of the
children and adolescents who receive mental-health services have been previously
treated, and over three quarters of adult clients in treatment report that their problems
UTILITY OF THE BGII 2
began in their childhood years. These staggering numbers further highlight the need for
increased accuracy and more efficient diagnostic protocols in practice. The importance of
early identification of mental-health disorders and the need for effective forms of
treatment have been the focus of the American Psychological Association for several
years (Tolan & Dodge, 2005). However, it is important to note that effective treatment
relies on the ability of the treating clinician to provide an accurate diagnostic profile of
the individual.
Diagnosis is a cornerstone of the practice of counseling psychology. Nowhere is
this more evident than in the work with children and adolescents. Not only is the
formulation of a diagnosis the first step in treatment planning, but it also lays the
foundation upon which all future therapeutic work is built. The Diagnostic and Statistical
Manual, 4th edition Text Revision (DSM-IV-TR) is the current manual on which all
clinical diagnostic decisions are based and is considered to be the most comprehensive
psychological diagnostic manual to date (American Psychiatric Association, 2000). Use
of the DSM-IV-TR has been credited with an increase in appropriate diagnoses and early
intervention in significant childhood mental illnesses. This in turn has led to improved
childhood outcomes in autism spectrum disorders, attention deficit hyperactivity disorder,
childhood depression, and bipolar disorder (American Psychiatric Association, 2000;
Charman & Baird, 2002; Lipovsky, Finch, & Belter, 1989; Valderhaug & Ivansson,
2005). However, one of the criticisms of reliance on the DSM-IV-TR is that the manual
allows too much overlap among different diagnoses (DeClercq, DeFruyt, Van Leewen, &
UTILITY OF THE BGII 3
Mervielde, 2006). Questions have also been raised about the meaning of disorders
categorized as adult disorders with consideration of a childhood onset, leading to further
overlap and difficulty among professionals with regard to treatment venues (Sourander et
al., 2005).
Psychologists have long sought quick, inexpensive, and empirically sound
measures that would assess pathology while giving maximum information with minimum
time spent scoring. The original Bender Visual Motor Gestalt (OBG) Test has always
been one such quick and direct measure of a child’s ability to perform visual motor
integration tasks. The OBG has a history of meeting these needs by being fast, easy to
administer, and capable of assessing multiple areas of client ability. It has been used by
clinicians as a standard measure in psychological batteries for more than 60 years. Before
any further detailed discussion of the revised Bender and its use in this research, a brief
discussion of the history of this significant test is in order.
The studies of children using direct measures, rather than the use of
questionnaires, have been debated for many years. German psychologists Brentano
(1838-1917) and Stumpf (1848-1936) supported the study of children and advocated
experimental studies based on internal cognitive processes rather than the questionnaire
approach pioneered by Hall (1844-1924) and early functionalists in the United States
(Hothersall, 1995).
The use of drawing as a medium for the measure of visual- motor integration
ability has been. The original Bender-Gestalt Test (OBG) titled The Bender Visual Motor
UTILITY OF THE BGII 4
Gestalt Test (Bender, 1938), was composed of drawings to be visually perceived and
reproduced by patients and was developed by Wertheimer (1880-1943) in his perceptual
psychology experiments and later formalized by Bender (1897-1987). Wertheimer was
one of the original Gestalt theorists whose theoretical works were cut short by World War
II and his flight from Nazi Germany. Although he came to the United States from
Germany, he did little to advance Gestalt theory in the United States in the years after the
war (Hothersall, 1995). Bender (1938), a psychologist working in New York, further
experimented with the figures by using the drawings with both her child and adult
patients. The resulting cards which Bender developed were slightly different than the
visual motor patterns of the originals by Wertheimer. The resultant client drawings were
considered an integration of the discrete internal processes of the drawer (Bender, 1938).
Based on Gestalt theory, the resultant drawing represents more than visual and motor
associations within the physical body. The drawings reflect association and cognitive
complexity within the individual. Individual variables that may affect the accuracy of a
drawing include age, physical and emotional development, as well as individual mental
and emotional states (Bender, 1938).
According to Bender (1938), developmental maturation of drawing is an ongoing
process that follows sequential stages, incorporating motor development from gross to
fine motor, visual imagery, and perceptual awareness. Thus, a beginning scribble
becomes more circular. Circles become loops and a tendency from vertical to horizontal
movements occurs, and then finally, dimensional awareness emerges in the drawing.
UTILITY OF THE BGII 5
These factors are all components of the maturational process that leads to the more
intricate representations found within mature representational drawings which then
represent a completed integration of internal processes. A deviation within this
developmental or maturational process would obviously lead to a disintegration of the
original representation. Psychology has found this process helpful in identifying those
individuals who have not yet matured, were delayed in visual motor perception, or who
were once matured, yet for various reasons may be losing such integration faculties.
The original Bender test (OBG) consists of nine figures on separate 3 x 5 cards.
There have been many research articles that have utilized the Bender as a criterion in
developmental processing, perceptual motor skills, and neurological intactness
(Brannigan & Decker, 2003). Horn and O’Donnell (1984) researched the early
identification of learning disabilities and found that the OBG was effective in identifying
those children classified as learning disabled and those with low achievement. Based on
original theory, the cards were also used as a direct measure of the underlying emotional
state in those thought to be of normal development and as a personality assessment to
assess internal motives (Hutt, 1985). The OBG has also been used as a test for emotional
problems and a personality test. The OBG was found to be useful in comparing
impulsive adolescents to those without impulse difficulties (Oas, 1984). Oas (1984)
found that adolescents with impulsivity disorders were significantly different from those
designated as nonimpulsive on the Matching Familiar Figures Test, a behavior rating
scale.
UTILITY OF THE BGII 6
The OBG monograph (Bender, 1938) was also used for patients within psychiatric
hospitals to differentiate between the functionally mentally ill and malingerers. However,
later research by Pascal and Suttell (1952) found this function to be invalid.
Subsequently, Mehlman and Vatovec (1956), Bowland and Deabler (1956), and Stewart
(1957) found that the Bender was reliable in differentiating between psychiatric and
nonpsychiatric patients being admitted to the hospital. Regardless of differing findings,
professionals continue to find the Bender to be of value in their evaluations and even
prior to revision it remained a favored test in use (Piotrowski, 1995; Piotrowski & Keller,
1989). Bender’s scoring system evaluated the overall quality of each design on a scale
that ranged from 1 to 5 on one design and from 1 to 7 on the other eight designs. Her
scoring system is based on accuracy toward perfection of the design in a completion of a
gestalt (Bender, 1938).
In an attempt to validate a direct measure of the emotional state of the individual
or the projective use of the Bender, different psychologists developed specific scoring
procedures The psychologists included Pascal and Suttell in 1952, Hutt and Briskin in
1960, Koppitz for children in 1963, Keogh and Smith in 1961 and Canter in 1976. These
specific scoring procedures have been joined by more recent scoring systems such as the
Advanced Psychodiagnostic Interpretation Scale by Rosenberg and Raphael in 2000
(Canter, 1968; Keogh, 1965; Pascal & Sutter, 1952; Piotrowski, 1995). These diverse
scoring systems have been criticized for insufficient reliability. Hutt’s popular scale for
determining psychopathology, developed in 1977, was found in 1983 to have
UTILITY OF THE BGII 7
questionable reliability and validity (Rossini, 1983). Keogh and Smith’s scoring system
did not provide normative data (Brannigan & Decker, 2006). The result of the
questionable validity and the numerous scoring systems has led to a slight decrease in the
use of the OBG cards among professionals (Archer et al., 1991; Wilson & Reschly,
1996). However, this situation has further led those in favor of the OBG to explore its
uses and revise the test with the goal of standardization as an empirically valid measure
(Brannigan & Decker, 2003, 2006; Brannigan, Decker, & Madsen, 2004; Reynolds,
2007).
The revision of the OBG was in process for many years, first at the American
Orthopsychiatric Association then later at Riverside Publishing, which held the
copyrights, with the work of many advisors and more than 25 years of collaboration
(Brannigan & Decker, 2006). Tolor and Brannigan in 1980 stressed the need for research
on the OBG, not only to assess personality dynamics and psychopathology, but also to
diagnose organic pathology, and predict school learning problems (Brannigan & Decker,
2006; Brannigan, Decker, & Madsen, 2004). Four main components were of importance
during the revision process: (a) keeping the original nine designs, but increasing the
number of designs; (b) inclusion of a memory procedure; (c) comparison of both the
deviation and quality based scoring systems; and (d) obtaining results from a nationally
representative sample for validity (Brannigan & Decker, 2003, 2006). The Bender Gestalt
II (BGII) was released for publication in 2003 by the Riverside Publishing Company.
UTILITY OF THE BGII 8
Along with the release of the BGII, the Global Scoring System (BGSS) was released as a
recommended standardized measure by Riverside Publishing Company.
Revisions were also begun on the original Koppitz scoring system. Pro-Ed
Publishing obtained the rights to the Koppitz scoring system and retained Cecil Reynolds
to revise its scoring version. It was released in 2007 as the Koppitz 2: the Koppitz
Developmental Scoring System for the Bender- Gestalt Test II (Reynolds, 2007). The
Bender Gestalt II Global Scoring System (BGSS) and the newly revised Koppitz 2
developmental scoring system are now used as the preferred empirically validated
measures for visual motor integration. However, they have not been researched beyond
this use and thus may not be a valid measure for personality assessment. It is worth
noting that the Koppitz 2 (Reynolds, 2007) has included the Emotional Indicators as an
additional measure to assess personality. But this component was not included in the
original norming process.
Only an unpublished dissertation from 2004 used the Bender Visual Motor
Gestalt Test II (BGII) with the scoring of the Koppitz 2 Emotional Indicators. Fidal in
2004 examined the BGII, using the Koppitz 2 Developmental Scoring System and the
Emotional Indicators Scoring systems on adolescents who had incidents of abuse versus
those who reported no such history. Fidal found that the independent t tests could not
significantly differentiate between the groups (Fidal, 2004). More research is needed to
compare the two methods to determine whether to support or retire the use of the
Emotional Indicators in order to end the controversy about the original Koppitz
UTILITY OF THE BGII 9
Emotional Indicators. A literature search for the use of the BGII as an assessment in
evidenced-based practice led to no findings. Thus this current research may be the first to
address the practical use of the BGII with the BGSS and the Koppitz 2 Scoring System in
clinical decision making using likelihood ratios (Koppitz, 1968, 1971, 1975).
Statement of the Problem
This study attempts to address one of the main deficits that can occur in the
diagnosis and treatment of children and adolescents with mental-health issues: that is the
difficulty in finding a direct measure of the child’s functioning and internal state without
either asking overly obvious questions or relying on information from significant others.
Thus, this study will combine the task of the practitioner with that of the clinical
researcher to determine the practicality for the more frequent use of the BGII. The OBG
has been widely used as a projective screening device, as previously defined. However,
most of the research was limited to adult pathology. Little research was conducted in
regard to childhood pathology and less was directed toward the adolescent population
(Belter, McIntosh, Finch, Williams, & Edwards, 1989; Rossini & Kasper, 1987). In
children, the OBG was used mainly as a test of visual motor development and the
research at the time supported this conclusion (Decker, Allen, & Choca, 2006; Koppitz,
1971). However, the original Bender test, now the BGII has been revised to include more
drawing items that increases the baseline and ceiling of the test, added the recall
procedure and developed an empirically normed scoring system called the Global
Scoring System (BGSS) for standard use among clinicians (Brannigan & Decker, 2003,
UTILITY OF THE BGII 10
2006). It would be of benefit to clinicians if the revisions and new scoring systems
contributed to the psychologist’s ability to accurately diagnose. Furthermore, it would be
beneficial to compare the newly revised BGII forms with another highly valid behavioral
report measure that identifies childhood and adolescent diagnostic areas of concern in an
attempt to revive the original purpose of the Bender as a psychodiagnostic technique. It
also might be useful to compare the BGII findings with the newer clinical assessments
available to psychologists such as the semistructured clinical interview.
In this study the results of the BGSS and the Koppitz 2 scoring systems for the
newly revised Bender Visual Motor Gestalt II (BGII) test were compared with two
clinical screening measures of mental disorders: the Child Behavior Checklist, known as
the CBCL (Achenbach, 2004), and the Kiddie Schedule for Affective Disorders and
Schizophrenia-Present and Lifetime Edition, known by the acronym KSADS-PL+
(Findling et al., 2001; Youngstrom & Duaz, 2005; Youngstrom, Findling, Danielson, &
Calabrese, 2001). The CBCL and KSADS-PL+ have been previously compared to each
other in research (Wassenberg, Max, Koele, & Firme, 2004; Youngstrom et al., 2001).
The current research is exploratory in nature as it will attempt to find relationships
between the revised Bender Gestalt II with the CBCL and KSADS-PL+ used in the
diagnosis of childhood psychopathology. In order to assess the clinical utility of these
results, the BGII using the BGSS and Koppitz 2 data were compared to results derived
from the previously mentioned valid measures of pediatric diagnostic decision making,
the KSADS-PL+ and the CBCL. In the current study, two updated scoring procedures for
UTILITY OF THE BGII 11
the BGII test were compared, the BGSS and the Koppitz 2, to see if either provided
clinicians with a valid, cost-effective, projective-screening measure that would aid in
diagnosis of mental-health issues in clinical practice with children.
UTILITY OF THE BGII 12
Research Questions
RQ1. Is the BG II, using the BGSS and Koppitz 2 scoring measures, an effective
psychometric screening tool to use with a clinically referred population of children aged
5 to 18 when compared to a commonly used screening instrument such as the Child
Behavior Checklist (Achenbach System of Empirical Based Assessment [ASEBA],
2006)?
RQ2. Are the scores derived from the BGSS and the Koppitz 2 scoring systems
effective measures for the diagnosis of the presence or absence of pathology in subject
children and adolescents when compared to the Washington University version of the
Kiddie Schedule for Affective Disorders and Schizophrenia (WASH-U-KSADS-PL+), a
research diagnostic instrument?
RQ3. Are there significant relationships among the results derived from the BGII,
using the BGSS and Koppitz 2 scoring systems, with final pediatric DSM-IV-TR
diagnoses of LEAD consensus results following the KSADS-PL+?
RQ4. Are there any significant relationships between the BGSS and Koppitz 2
scores (including the Koppitz 2 Emotional Indicators) derived from the BGII when it is
used with clinically referred children from 5 to 18 years of age?
Research Significance
The significance of this study lies in the joining of the BGII, BGSS scoring
system with the KSADS PL+, a semistructured diagnostic interview, and was termed
“groundbreaking” research according to Gary G. Brannigan (personal communication,
UTILITY OF THE BGII 13
11/30/2006). Brannigan is the co-author of the Bender Visual Motor Gestalt Test –
Second Edition (BGII). Comparisons of the BGII with several behavioral rating scales
(Belter et al., 1987; McCormick &Brannigan, 1984) and pathology groups (Field, Bolton,
& Dana, 1982; Rossini & Kaspar, 1987; Shapiro, & Simpson, 1995) yielded varying
findings. Shapiro and Simpson (1995) found that primary psychiatric diagnosis was not
related to Bender performance when using the earlier Koppitz scoring system. However,
these studies were conducted with the OBG and are now considered outdated. This study
will examine individual patient drawings and emotional factors to determine the
diagnostic value of assessing childhood pathology in relation to the BGII cards and the
two scoring methods as previously mentioned. From this study, it is hoped that a simple
test will eventually emerge providing a bridge between an extensive and laborious
clinical evaluation and a parental checklist of subjective problems, thus providing another
source of valid psychometric data to be used in psychological batteries. Such a test may
prove to be an effective, additional diagnostic instrument in standard intake protocols.
This study was open to all children 5 to 18 years of age who sought mental-health
services at a large community mental-health center in Cleveland, Ohio. It was part of a
larger study which included profiles of 825 children. The July 2006 population of
Cleveland was approximately 444,313, with an estimated median household income of
$24,105 (2006). The percentage of residents living in poverty was 32.4% in 2005 (20.4 %
for White non-Hispanic residents, 39.3% for African American residents, and 38.5% for
Hispanic residents). The racial makeup of Cleveland in 2006 was 51% African American,
UTILITY OF THE BGII 14
38.8 % White non- Hispanic, 7.3% Hispanic, 0.9% American Indian, and 3.6% other
races, with 2.2% of the population listed as two or more races. The Cleveland Municipal
School District is the largest school district in Ohio (City Data.com, 2009).
Definition of Terms
Base rate. This is a quantitative representation of a particular event occurring in a
population or setting. Base rates in mental-health settings are usually based upon
demographic information of the clients, and can be influenced by referral sources,
geographical locations, previous diagnoses given by different clinicians, and the market
area of the clinician. Base rate is the starting point in diagnostic decision making when
using Evidence Based Practice assessment (Youngstrom & Duax, 2005).
The Bender Gestalt II Global Scoring System. The BGSS was specifically
designed to assess visual motor integration across a lifespan and aid in discriminating
various types of learning, psychological, and neurological problems using the revised
BGII (Brannigan & Decker, 2003). It has been chosen because it represents the first
standardization of the revised drawings of the OBG and is preferred by the BGII
publishers, Riverside Publishing Company. The BGSS allows two phases of
administering the BGII, the copy phase and the recall (memory) phase with a total score
developed for each phase (Brannigan & Decker, 2003).
Bender Visual Motor Gestalt Test II. The BGII is the test based on the original
nine Bender drawings (OBG) plus seven additional drawings developed by Brannigan
and Decker for the revised edition. These researchers added four cards to the beginning
UTILITY OF THE BGII 15
of the protocol to be used for children 4 to 7 years of age resulting in a total of 13 cards.
Three additional cards were added to the end of the original protocol for the subjects over
eight years of age resulting in a total of 12 cards.
The Child Behavior Checklist. The CBCL, developed by Achenbach and
Rescorla (2001), is a rating scale commonly used to assist in making a pediatric
psychological diagnosis. It assesses a broad range of behavioral symptoms found in
children with emotional difficulties (Achenbach, 1991). The CBCL was revised in 2001
when two different forms were created: the CBCL for ages 1 to 5 years (CBCL/1 -5) and
the CBCL for ages 6 to 18 years (CBCL/6-18; ASEBA, 2006). The CBCL uses three
different forms for reporting: parent, teacher, and self-report. Rater diagnosis is based on
how often each item has occurred currently or within the past 6 months for specific
questions using a forced 3-point scale response form. There are two open-ended items for
the individual to report additional problems. However, the open-ended items are
qualitative and are not included in standard scoring (ASEBA, 2006). The CBCL
provides T scores and percentile scores for different behavioral areas. These areas are
divided into three competency scales (Activities, Social, School), a total competency
scale, eight syndrome scales, two broad problem scales, and a total problem scale. The
eight syndromes are aggressive behavior, anxious/depressed, attention problems, rule-
breaking behaviors, social problems, somatic complaints, attention-deficit hyperactivity
problems, oppositional-defiant problems and conduct problems (ASEBA, 2006).The
broad problem scales are comprised of internalizing and externalizing problem scales.
UTILITY OF THE BGII 16
High T scores are scores above 70 on the problem scales and are indicative of pathology
in that particular scale. However, resultant high T scores on the competency scales are
indicative of mental resiliency and internal strengths of the individual (ASEBA, 2006).
Diagnostic Likelihood Ratio (DLR). The ratio of two proportions: the sample of
people with a particular test result among all those who have a specific condition divided
by the sample of people with the same test result among all those without the condition
(Hamza, 2008). A DLR is the ratio of the posttest odds of having a particular diagnosis
to the pretest odds of having that diagnosis among the general population based on a
specific assessment (CHOI, 1998). This is calculated by obtaining the sensitivity and
specificity of a particular test. Sensitivity is the probability of obtaining a positive test
result among those with a true diagnosis. Specificity is the probability of obtaining a
negative test result among those individuals without the particular diagnosis (CHOI,
1998).
Evidence-based practice. An effective treatment approach that is used for
specific disorders based on systematic empirical research. Most areas of medicine,
psychology, and sociology encourage the promotion of treatments based on empirical
evidence (Luebbe, Radcliffe, Callands, Green, & Thorn, 2007; Youngstrom & Duax,
2005). Evidence-based practice incorporates the best research evidence available, patient
preference and clinical judgment in an effort to combine clinical practice and research
advancements in psychology to advance the field (Luebbe et al., 2007).
UTILITY OF THE BGII 17
KSADS-PL+ with LEAD percentages. The Kiddie Schedule for Affective
Disorders and Schizophrenia for School-Age Children (K-SADS) is a semistructured
interview for the psychiatric assessment of children and adolescents aged 6 to 18. It was
adapted from the adult version, Schedule for Affective Disorders and Schizophrenia
(SADS) developed in 1978 by Endicott and Spitzer (Ghanizadeh, Mohammadi, &
Yazdanshenas, 2006). There are currently three accepted versions of the KSADS. The
Kiddie Schedule for Affective Disorders and Schizophrenia- Present and Lifetime Edition
(KSADS-PL+) was originally developed by Puig-Antich and is the version used in this
study (Ghanizadeh et al., 2006). The KSADS-PL+ version is a further modification that
includes additional items sensitive to symptoms of depression and mania and has been
used to assess the presence of 32 DSM-IV-TR diagnoses. Scoring allows the researcher to
question for the presence or absence of various symptoms within essential DSM-IV-TR
categories with further questioning based on the clinical judgment of the interviewer
(Ghanizadeh et al., 2006). The current version includes items from the Young Mania
Rating Scale as well as the mood disorders module originated at Washington University,
(WASH-U-KSADS-PL+; Geller et al., 2001; Youngstrom, 2005).
The Koppitz—2 Visual Motor Index. This instrument is a measure of overall
visual-motor integration skill and is defined as the ability to relate visual stimuli to motor
responses in an accurate and appropriate manner based on the developmental scoring
system of Elizabeth Koppitz and adapted for the Bender Gestalt II (BGII; Reynolds,
2007). The Koppitz 2 raw score is based on yes/no questions specific to common errors.
UTILITY OF THE BGII 18
On the Koppitz 2, each drawing may have several questions regarding errors, such as on
Design 7, the columns are all slanted left to right, No = 0, Yes = 1; the more errors
produced by the individual, the lower the score. The Koppitz -2 Visual Motor Index was
an age-corrected deviation-scaled score set (M= 100, SD= 15). This error- based
approach is different than the BGSS which focuses on drawing accuracy in copying the
original stimulus card. The categorical descriptive ratings were identical to the BGSS as
previously described: 80-89 is below average, 90-109 is average, and 110-119 is high
average (Reynolds, 2007). The time to complete the drawings has been standardized for
age and results are interpreted according to the number of errors (Reynolds, 2007). The
older version of the Koppitz Developmental Scoring System (1975) was the most
commonly used procedure to score the OBG designs produced by children ages 5 to 12
(Shapiro & Simpson, 1995).
The Koppitz 2 adds an additional score to the BGII, which is the Koppitz 2
Emotional Indicators (EI). The EI score is based on the OBG and original research
regarding emotional pathology. Koppitz (1975) developed the EI test when used as a
projective measure to identify children with emotional problems (Reynolds, 2007).
Unlike the Koppitz 2 Visual Motor Scores, the Emotional Indicators are added based on
errors drawn. The Koppitz 2 Visual Motor Index scores and Emotional Indicators scores
have been included in this study as they are popular for use among clinicians treating
children and adolescents.
UTILITY OF THE BGII 19
Longitudinal Expert Evaluation of all Available Data (LEAD; Spitzer, 1983).
The LEAD assessment involves a formal review of all the data by an expert or team of
experts (Klein, Ouimette, Kelly, Ferro, & Riso, 1994; Pilkonis, Heape, Ruddy, & Serrao,
1991). LEAD was held post interview and was a final review of all presenting data under
the direction of a licensed child psychologist. LEAD was held in person, by telephone, or
video conferencing with all raters reviewing the results of the KASDS-PL+, along with
the child’s supplemental information such as the child’s history, family history, and
school behaviors. Final diagnoses were made by consensus based on a degree of
certainty by the assigned raters.
Pediatric Diagnostic Categories. The categories use the DSM-IV-TR as the basis
for the diagnosis and categorization of mental disorders for both research and clinical
practitioners. Affective Disorders include both the unipolar and bipolar types of affective
disorders. Behavioral disorders include Attention Deficit Hyperactivity Disorder
(ADHD) and Disruptive Behavior Disorders. The Residual Disorder category includes
Anxiety Disorders which further includes Panic Disorders, Obsessive Compulsive
Disorder, Posttraumatic Stress Disorder, Phobias, or any other diagnosed anxiety disorder
within the DSM-IV-TR. The Residual Disorders category may also include those
individuals not diagnosed with any disorders from those major categories but who were
still seeking mental-health interventions. These may include those with Adjustment
Disorders, Psychotic Disorders, Learning Disorders or those individuals that have
symptoms that do not meet any DSM-IV-TR Axis I criteria (Youngstrom et al., 2001).
UTILITY OF THE BGII 20
Psychometric screening tool. This is an instrument developed to determine
minimum criteria for inclusion in a specified group without the loss of the reliability or
accuracy of the instrument (Hildreth, 1945). A screening instrument is often meant to be
a brief assessment within a larger more detailed examination.
Receiver Operating Characteristic (ROC). ROC is a statistical method which
graphically plots the sensitivity or true positive rate against the false positives rate of the
occurrence of a specific condition. This is done to illustrate the accuracy of a particular
diagnosis using a specific measure versus the probability of an inaccurate diagnosis using
that same measure (Zweig & Campbell, 1993). The ROC analysis and the ROC curve are
useful in the selection of optimal tests or measures and the dismissal of nonuseful ones
based on a statistically derived discrimination threshold. The ideal prediction measure
results in a 1.0 diagnostic likelihood ratio, which represents 100% sensitivity and 100%
specificity regarding a particular diagnosis. This outcome is highly unlikely, however, as
there are usually a number of false negative and false positive outcomes in diagnostic
testing (Zweig & Campbell, 1993).
Semistructured interview. This is a diagnostic assessment technique with
specific standardized questioning areas. These include suggested questions to cover
specific DSM areas while using the clinical knowledge and judgment of the rater to
decide on appropriate interventions for each individual client (Geller et al., 2001).
UTILITY OF THE BGII 21
Chapter 2: Review of Selected Literature
The psychological interview and the psychological assessment battery are two
methods used by clinicians in making psychological and psychiatric diagnoses. Once a
diagnosis has been formulated the results of the interview and assessment battery can be
further used in developing an effective treatment plan. There are three basic methods for
diagnosing individuals, the more traditional, less structured, open-ended interview, the
structured interview and the semistructured interview. The more unstructured clinical
interview has given way to more formalized interviewing techniques in order to increase
accuracy and accountability in diagnosis and treatment. Each method has its advantages
and disadvantages. The unstructured interview is qualitatively based, whereas the
structured interview is quantitative and symptom based, while the semistructured
interview attempts to balance diagnostic accuracy with clinical meaning by combining
aspects of both the unstructured and structured approaches (Jellinek & McDermott,
2004). Additionally, the intervention of managed health-care panels and insurance policy
constraints has changed the level of accountability for psychologists with the requirement
to evaluate and diagnose based on the DSM-IV-TR (Cashel, 2002; Jellinek & McDermott,
2004).
Cashel (2002) found that the training of clinicians to conduct psychological
assessments has changed very little over the years even with changes in psychological
training objectives which now focus more on evidenced-based practices (Luebbe et al.,
2007). Practicing clinicians working with children have, in recent years, increased their
UTILITY OF THE BGII 22
use of structured observations, behavior rating scales, and shorter formats of intelligence
measures (Archer et al., 1991; Cashel, 2002). These abbreviated measures and screening
instruments also serve as a form of cost containment for psychologists (Archer et al.,
1991). Cashel reported that many of the clinicians surveyed reported significant
limitations placed upon them by outside sources. He further believed that this ultimately
restricted effective diagnostic decision making. Clinicians find themselves bound by
increased pressure to be more outcome and diagnosis based and less focused on process
in order to quickly arrive at a diagnostic label.
Contrary to this tendency is the reality that appropriate diagnosis and early
intervention in mental illness are not only necessary but also have led to improved
outcomes in children with different clinical diagnoses, such as autistic spectrum
disorders, attention deficit hyperactivity disorder, depression, and bipolar disorder
(American Psychiatric Association, 2000; Charman & Baird, 2002; Lipovsky et al., 1989;
Valderhaug, & Ivansson, 2005). For example, Charman and Baird (2002) found that
increased recognition of symptoms by primary health practitioners, more frequent use of
screening instruments by professionals, and evidence that intervention improves
outcomes, have all contributed to the earlier diagnosis of autism from a mean age of 12
years to four years. Kamphaus et al. (2000) found that the expansion of school-based
psychological services has led to an increased awareness of childhood mental-health
issues and earlier identification of those in need of instructional and developmental
services.
UTILITY OF THE BGII 23
Psychologists continue to have an important role in striving to maintain a high
standard of care while adhering to the goals of achieving diagnostic accuracy and dealing
with the demands of an ever-changing profession. Therefore, the initial diagnostic
assessment becomes a pivotal point in the treatment process (Krueger & Finger, 2001). A
thorough diagnostic assessment should include detailed information on development,
family and social history, parental description of everyday behavior, activities of the
child, direct assessment of the child’s communicative, intellectual and adaptive
functioning, as well as the child’s self-perception (Charman & Baird, 2002; Elbert, &
Holden, 1987; Sourander et al., 2005). How psychologists obtain that information in the
clinical interview has been the prerogative of the clinician. Originally, a subjective
clinical interview process was employed, but this utilized open-ended questions and
projective assessment measures to evaluate and diagnose (Jellinek & McDermott, 2004).
Under certain conditions, clinical judgment came less into play as the use of behavioral
assessment tools, both formal and informal, became available to assist in information
gathering and assess client functioning (Aklin & Turner, 2006; Dryden, 1986). However,
it still remains largely an area of individual preference for each clinician to decide exactly
how to gather data, evaluate, and report findings.
Evaluated data and findings are then summarized into a clinical picture of the
individual using language common to professionals and treatment specialists regardless
of their theoretical orientations (Aklin & Turner, 2006; Dryden, 1986). The DSM-IV- TR
(American Psychiatric Association, 2000) is the current clinical resource manual that
UTILITY OF THE BGII 24
guides diagnostic decision making, and that allows professionals to have a common basis
or language in which to initiate treatment, coordinate care, and discuss prognostic
concerns (Morgan, Olson, Krueger, Schellenberg, & Jackson, 2000). The authors of the
DSM-IV-TR do not claim to adhere to any particular theoretical framework (American
Psychiatric Association, 2000). The DSM-IV-TR is a working medically based manual
used not only by psychologists but also by many other health care practitioners, and
remains important in this constantly changing field of science. Therefore, it has been
under constant revision since its inception, attempting to alleviate weaknesses in previous
editions and to facilitate diagnosis of individuals based on new research findings (Morgan
et al., 2000).
Achieving individual diagnostic accuracy especially with children and adolescents
has been challenging and is also well documented (Aklin & Turner, 2006; Gerber,
Appleton, Dykeman, Sampson, & Toews, 1994; Smith, Muir, & Blackwood, 2004).
Although children and adolescents present with their own particular mental-health issues,
until recently research and treatment have relied heavily on adult research and treatment
guidelines for diagnosis and treatment of these specific groups. This situation has lead to
the likelihood of misdiagnosis or under-diagnosis within the field (Aklin & Turner, 2006;
Gerber et al., 1994, Tolan & Dodge, 2005). For example, misdiagnosis based on adult
guidelines can result in children receiving medications that do not produce the desired
clinical effects and which may also result in serious negative outcomes. Misdiagnosis has
also been related to longer and more costly treatment regimens. These negative aspects
UTILITY OF THE BGII 25
have resulted in a general lack of trust in the profession by both clients and their families
(Aklin & Turner, 2006; Smith et al., 2004). In an effort to lessen the likelihood of
misdiagnosis and eliminate the need for costly testing, psychologists have begun to look
at the use of clinical base rates and evidence-based practice, as previously discussed,
when diagnosing children and adolescents (Youngstrom & Duax, 2005).
To assist in diagnostic decision making, psychologists have often relied on the use
of lengthy multimethod assessments to insure a more global perspective on the individual
(Archer et al., 1991; Dryden, 1986). Multimethod assessments within a psychological test
battery involve several informants’ reports of the child’s behavior, as well as direct
measures of the child’s intelligence and abilities. For many reasons, including possible
denial, mistrust of professionals, or limited insight, children may not be the best reporters
of their own problems. Therefore, professionals have resorted to obtaining information
regarding children from other sources, often in the form of checklists (Achenbach, 1991).
It is understandable that reports of children and adolescent behaviors elicited at school
may differ from behaviors seen within the home and vice versa. Therefore, information
regarding the parent-child relationship and the teacher-child relationship is thought to add
to the global picture in an assessment (Achenbach, 1991; Achenbach & Rescorla, 2001).
Along with reported views of the child, a multimethod assessment would include
psycho-educational testing instruments, personality profiles, behavioral checklists and
mental status examinations. Direct assessment of a child’s skills or functioning is
obtained through intellectual evaluation and cognitive assessment. Performance results
UTILITY OF THE BGII 26
are standardized and compared to the results of a reference group, normed either for age
or grade. Some of the more common tests include the Wechsler Intelligence Scales and
the Woodcock Johnson Scales (Archer et al., 1991). Psycho-educational testing alone
does not specifically address mental-health and pathological concerns and tends to focus
only on intellect and ability. Thus, personality or psychopathology is often mainly
assessed in the form of behavioral checklists as the increase in the use of empirical
methods has caused psychologists to be more cautious in the use of projective measures
(Cashel, 2002; Elbert & Holden, 1987; Kamphaus et al., 2000).
Predoctoral internship programs offering psychological training for child
psychologists have found that generally teaching methods have remained relatively stable
over the past two decades (Archer et al., 1991, Cashel, 2002). However, changes within
general clinical practice have occurred. These include as noted, school- based assessment,
growth of abbreviated IQ measures, and popularity of behavioral measures over
projective measures (Cashel, 2002; Elbert & Holden, 1987; Kamphaus et al., 2000).
Clinical researchers have also become aware of the need to develop a thorough
and efficient intake protocol geared toward accurate diagnosis of childhood mental-health
disorders (Youngstrom, 2001). As previously discussed, psychology has traditionally
utilized both objective and subjective measures to aid the clinician in reaching accurate
diagnoses. Since the release of the original Binet intelligence test in 1908, assessment
measures have expanded to include not only tests for intelligence, but also for
achievement, vocational aptitude, personality, neuropsychological, development, and
UTILITY OF THE BGII 27
individual behavioral. Currently clinicians also use many different subjective, objective,
self-report, checklist, or observational techniques. The development of the empirically
based DSM-IV-TR classification system of diagnosis has resulted in more reliance by
clinicians on formal data collection of symptoms, identification of problem behaviors,
and determination of functioning (First et al., 2004).
Therefore, the initial interview and diagnostic formulation become pivotal points
in the treatment process. The interview or assessment method used to obtain pertinent and
valid information should include the following: development, family and social history,
parental description of everyday behavior, activities of the child, direct assessment of the
child’s functioning, and the child’s self-perception (Charman & Baird, 2002). A continual
quandary for many practitioners is the need to get as much information as possible within
a limited time frame and to balance the rigorous task of information gathering with that
of cost effectiveness.
One evidenced-based method developed to help obtain accurate diagnoses and
reduce clinician diagnostic error allows psychologists to use the statistically based
method of probabilities used by medical professionals. The Diagnostic Likelihood Ratio
(DLR) method was adapted for use by psychologists and incorporated the inclusive
symptom approach of the DSM-IV-TR with quantitative analyses (Youngstrom & Duax,
2005). This evidence-based approach to assessment allows the clinician to weigh the use
of various costly testing measures against the likelihood of supporting or negating
possible diagnoses. Evidenced-based practice if used in conjunction with accurate base
UTILITY OF THE BGII 28
rates and likelihood ratios can reduce the previously discussed conflict between costly
batteries of tests and limited time for diagnosis. However, there is a twofold problem in
incorporating this approach to assessment. The practicing clinician needs to know his
client base rate which is based on the likelihood of anyone in his market area having the
specific diagnosis as well as the best assessment measures to evaluate for that specific
disorder. More simply stated, the question is whether a particular assessment is going to
add to the clinician’s fund of knowledge about a specific diagnosis and will aid in
reaching an accurate diagnosis in a timely and cost effective manner.
Research Focus
The purpose of this study was to compare the BGII drawings, using the BGSS and
Koppitz 2 scoring methods, with the diagnostic accuracy of the WASH-U-KSADS-PL+
interview and the popular CBCL measure to determine the diagnostic utility of the BGII
as a decision making tool. In the current study, the BGSS and the Koppitz 2 scoring
system for the BGII test were evaluated in a research setting to assess their contribution
as projective screening measures that would aid diagnosis and treatment planning in
clinical practice with children. It has been the hope of this researcher to begin a process
leading to an increase in the clinical utility of the BGII not only as a developmental visual
motor performance test but also as an additional effective assessment tool in a
psychological test battery. Since this is exploratory research, the BGII, BGSS and the
Koppitz 2 scoring results have been used. It was predicted that the BGSS and the Koppitz
2 would do equally well in screening children for visual motor integration errors. It
UTILITY OF THE BGII 29
remains uncertain whether they can effectively reach a diagnosis when compared with the
WASH-U-KSADS-PL+ and the CBCL in a clinical sample of children. Both scoring
measures of the Bender Gestalt II Test, although similarly normed by current research
and measurement techniques, may yield different results.
The WASH-U-KSADS-PL+ (Youngstrom et al., 2005) is currently being used in
an ongoing study “Assessing Bipolar Disorder: A Community Blend (ABACB)” to
assess accuracy of childhood diagnosis with regard to childhood bipolar disorder in
Cleveland, Ohio, through Case Western Reserve University. Permission to conduct the
larger study was originally given by Case Western University Institutional Review Board
in 2003 as a 5-year study comparing different instruments in diagnostic assessment for
juvenile bipolar disorder. Addenda to this study have been routinely added during the
yearly review process to compare various measures, both research and clinically driven.
The University of North Carolina Institutional Review Board further reviewed this study
in 2006 for data collection at an additional site. Once the IRB addendum was approved
through Case Western Reserve University, this study was presented to the West Virginia
University Institutional Review Board (Youngstrom, 2005). West Virginia University
determined the data collection was archival and the present study was given exempt
status. All Institutional Review Board releases can be viewed in Appendix A.
The major purpose of the Case Western Reserve University study with sites at
Applewood Centers Inc., Case Western University, and University of North Carolina was
to clarify the characteristic features of childhood bipolar disorder in children, to cross
UTILITY OF THE BGII 30
validate a childhood bipolar screening protocol for use in clinical settings, and investigate
the developmental changes in symptoms across the age span using both cross sectional
and longitudinal approaches (Youngstrom, 2005). Because one of the main purposes of
the research study was the validation of potential screening measures, these screening
tools were segregated from the LEAD assessment and reviewed and analyzed separately
(Youngstrom et al., 2005). The BGII was one screening tool that was added in the fifth
year of the study and was also segregated from the LEAD process (see Appendix B). If
any of the Bender results had been found to be an adequate predictor of any specific
diagnostic criteria then further analyses would have been conducted to determine its
future use by developing a diagnostic likelihood ratio. In this study, a DLR was only
performed for children with the diagnosis of ADHD based on the KSADS-PL+ with
LEAD results (Frazier, 2006; Jaeschke, Guyatt, & Sackett, 1994; Youngstrom, 2006).
The Bender Visual Motor Gestalt Test
The OBG has been commonly used as a quick, simple, direct measure of the
individual’s internal psychological state (Archer et. al., 1991; Bender, 1938; Piotrowski
& Keller, 1989). The originally titled OBG, or the Bender Visual Motor Gestalt Test
(Bender, 1938), was composed of simple geometric drawings developed by Max
Wertheimer (1880-1943) in his perceptual psychology experiments and later formalized
by Bender (Archer et al., 1991; Bender, 1938; Koppitz, 1975).
According to Bender, developmental maturation of drawing ability is an ongoing
process that follows sequential stages, incorporating gross to fine-motor skills, visual
UTILITY OF THE BGII 31
imagery, and perceptual awareness and a developmental drawing tendency from vertical
to horizontal movements, and finally from two dimensional to three dimensional
awareness. These stages were all considered part of the maturational process leading to
the more intricate representations found within a mature representational drawing that
become a complete gestalt or perceptual integration (Bender, 1938). Bender theorized
that a deviation within this maturational process would obviously lead to a disintegration
of the original representation or errors between the drawing and its stimulus. As
previously noted, psychologists have found this concept helpful in identifying those
individuals who have not yet developmentally matured, or are delayed in their visual
motor perception, as well as those who were once psychologically mature yet for various
possible reasons may have lost such integration (Bender, 1938; Brannigan & Decker,
2003, 2006; Koppitz, 1975 ; Reynolds, 2007).
The relationship of visual-spatial skills and working memory in areas of
intelligence when correlated with tasks of executive functioning has been more recently
addressed by Miyake, Friedman, Rettinger, Shah, and Hegarty, (2001). These researchers
found that people who are good at complex visual-spatial tasks also perform better on
executive function tasks which are crucial in regulating and controlling behavior (Miyake
et al., 2001). Shapiro and Simpson (1995) found the Koppitz scoring system in disturbed
adolescents with emotional and behavioral disorders did not measure intelligence, but
was able to interpret maturational development, which correlates to the use of the BGII.
The researchers evaluated a clinical group of adolescents 12 to 17 years of age and
UTILITY OF THE BGII 32
determined that visual motor skills continue to develop beyond the age of 11 and the
scoring results were not found to be related to initial diagnosis, gender, or intelligence
level of the adolescent (Shapiro & Simpson, 1995). Their work with the original Koppitz
scoring system provided useful information with regard to adolescents for this research
study.
As previously stated, the OBG was one of the most popular assessment measures
for more than 60 years and it was consistently listed as one of the top used instruments by
psychologists when queried over those years (Archer et al., 1991; Brannigan & Decker,
2003; Piotrowski & Keller, 1989). There have been many researchers who cited the OBG
as the criterion measure in their research (Brannigan & Decker, 2003; Reynolds, 2007;
Hutt, 1985). Horn and O’Donnell (1984) studied the early identification of learning
disabilities and found that the OBG was effective in identifying those children classified
as learning disabled as well as those with low achievement. The OBG was found to be
useful in separating those adolescents diagnosed with impulsivity from those without
impulse difficulties (Oas, 1984). The Bender Gestalt–Recall technique was found to be a
valid measure of short-term visual memory in children and adolescents and comparable
to the Coding recall measure on the WISC-III (Imm, Kim, Belter, & Finch, 1991). Since
the OBG has been used in over 1,300 published articles and 60 years of research, its
strengths and weaknesses are well documented (Brannigan & Decker, 2003).
The use of the OBG became popular and widespread even with cautions on its use
as a projective and neurological test. It continues to be a preferred measure possibly
UTILITY OF THE BGII 33
because of its simplicity and short administration time (Bigler & Ehrfurth, 1981). The
popularity of the Bender as a clinical tool has led to the development of a plethora of
scoring measures and interpretive manuals over the years, although some exhibit highly
questionable reliability and validity (Brannigan, & Decker, 2003; Hutt, 1985; Perticone,
1998).
As a result the broad utility of the OBG has come into question over time. Many
psychologists have cautioned against wide use of the OBG for diagnostic evaluations
(Bigler & Ehrfurth, 1981) because of low reliability with regard to its use as a projective
test (Naglieri & Pfeiffer, 1992). However, it still remains one of the top 10 instruments
used by psychologists (Piotrowski & Keller, 1989). Thus, the OBG has had two preferred
uses in a childhood population: first, as a visual motor development instrument and then
as a projective measure for identification of certain psychological conditions in both
children and adults (Rossini & Kaspar, 1987).
Current Use of the Bender Visual Motor Gestalt—Second Edition
The Bender Gestalt II (BGII; Brannigan & Decker, 2003) is a modification of the
OBG and maintains the importance of the quality of the drawing as its basis of scoring.
However, the number of designs was increased from 9 to a total of 16 (Brannigan &
Decker, 2003). Four of the new designs are to be given to children younger than 8 years
old and precede the original set of drawings. When giving the Bender Gestalt II drawings
to anyone 8 years old or above, there are three additional drawings that follow the
original administration (Brannigan & Decker, 2003).
UTILITY OF THE BGII 34
Once developed and refined, the BGII was renormed using a national
representative sample obtained through the 2000 US census as well as clinical samples of
selected diagnoses (Brannigan & Decker, 2003). Another change in the BGII is that it
now includes perception and motor subtests to detect specific problems separate from
integrative processes for those individuals who perform below expectation (Brannigan &
Decker, 2003). In order to measure visual-motor integration skills in children and adults
from four to 85 years of age, the BGII is administrated in two stages. The first consists of
a Copy and Recall phase, followed by two supplementary tests (the Motor Test and the
Perception Test).
On the BGSS, scoring is performed by conversion of raw scores to Standard
Scores ranging from 40 to 160 (M = 100, SD = 15). Scores can also be converted to
percentile ranks, t scores, z scores, and age equivalents. Individuals scoring 1.3
standard deviations below the mean Visual Motor Index of 100 were considered to be
“mildly impaired” and those scoring more than 2 standard deviations below the mean
were considered “significantly impaired” (Brannigan & Decker, 2003). Above average
performance on the BGSS and Koppitz 2 presupposes normal levels of both visual-
perceptual skills and fine-motor coordination. Below-average performance may be due
to problems in the domain, the integrative process, motivation, attention, or related
concerns (Reynolds, 2007). A recent study examined the use of the BGII with children
diagnosed with ADHD. Results indicated that those with ADHD tended to do more
UTILITY OF THE BGII 35
poorly than a normal child group. However, these differences disappeared when
intellectual level was statistically controlled (Allen, 2005).
Despite the 2003 revision of the OBG, it is still regarded as controversial to many
in the field, as will be discussed in greater detail in subsequent sections. However, as
noted in the purpose of this research there are those who hope to rectify this situation and
establish the BGII as a clinically useful tool.
The Koppitz 2 and the Koppitz Developmental Scoring System
The Koppitz 2 scoring system for the BGII is a revision of the OBG Koppitz
developmental scoring system developed in 1963 by Koppitz, a child psychologist
(Reynolds, 2007). Koppitz (1975) developed her own specific instructions for
administration and scoring using a developmental approach with scores based on errors
found within the design construction (Perticone, 1998). Scoring was based on total
number of errors throughout the OBG. The original Koppitz was revised by Reynolds, a
colleague and personal friend of Koppitz. In her early research, Koppitz had concluded
that both developmental and emotional results had an overall diagnostic value and
represented a possible indication of emotional disturbance within an individual
(Perticone, 1998). Koppitz (1975) was guarded about generalizing individual indicators
as personality characteristics. She saw them as more of a guide for further inquiry
(Perticone, 1998). Following the introduction of the BGII, the need for revisions in the
Koppitz Developmental scoring system became clear, resulting in Reynolds developing
the Koppitz 2 Developmental Scoring System.
UTILITY OF THE BGII 36
Along with the developmental scoring, Koppitz included a scoring manual for
Emotional Indicators, to differentiate children having emotional difficulties from those
appearing well adjusted (Perticone, 1998; Reynolds, 2007). The Koppitz Emotional
Indicators were developed using empirical methods based on psychodynamic theory to
distinguish children with serious emotional problems from those without problems.
Koppitz (1975) reported that emotional indicators were clinical symptoms or signs that
should be individually evaluated by the rater. A single indicator by itself may not indicate
serious pathology; however, a single indicator may show a manifestation or tendency
toward some disturbance. Therefore, indicators could occur separately or in combination.
The emotional indicators identified for scoring on the OBG were confused order, wavy
lines, dashes for circles, increasing size, large size of drawing, or overly small size of the
drawing, fine line, overworked or reinforced lines, second attempt of a drawing, and use
of two or more sheets of paper to complete the test. However, these remain controversial
due to their projective nature (Koppitz, 1975; Perticone, 1998; Reynolds, 2007).
Additionally, Koppitz (1975) cautioned that the indicators were not comparable or
correlated to the developmental test score.
The Koppitz 2 authors reviewed earlier research supporting the emotional
indicators based on over 500 studies from Koppitz and others. They removed those that
were demonstrated to be age related, and then included two indicators that were of
considerable empirical significance based on previous research (Reynolds, 2007). The
Koppitz 2 emotional indicators added drawing a box around one or more designs and
UTILITY OF THE BGII 37
spontaneous elaborations or changes in the overall gestalt of the design to the 10
emotional indicators reported by Koppitz in 1975. This resulted in a final total of 12
emotional indicators that when found in a profile were to be taken into consideration for
concern by the clinician (Koppitz, 1975; Reynolds, 2007; Rossini & Kaspar, 1987). The
original research for the Koppitz Emotional Indicators reported that three or more errors
were indicative of possible pathology (Koppitz, 1975). However, the current Koppitz 2
Emotional Indicators scoring is quantitative with more than four considered to be high
risk or of concern for emotional difficulties (Reynolds, 2007).
Archer et al. (1991) researched psychological test usage among psychologists and
found that the OBG rated third among all assessment measures for use with adolescents.
According to this research, the OBG was primarily reported as diagnostically useful in
learning disability assessments. This leads to a larger question: Exactly what resultant
data have been derived from the clinical use of the OBG and now the BGII? Is it a viable
screening method that can add information to the diagnostic assessment interview? Or
should the BGII be replaced by more structured diagnostic techniques to arrive at the
same necessary information? Finally, is the BGII of diagnostic benefit to include as an
assessment measure in children and adolescents?
Kiddie Schedule for Affective Disorders and Schizophrenia for School-Age Children
The Kiddie Schedule for Affective Disorders and Schizophrenia for Children (K-
SADS) is a semistructured interview. Methods of clinical interviewing have changed
since 1938 when the Bender cards were first introduced as a screening instrument (Aklin
UTILITY OF THE BGII 38
& Turner, 2006). In an attempt to reduce the high error rate, structured, and
semistructured interviews have been developed to replace the open-ended, more
unstructured interview process (Aklin & Turner, 2006). The structured interview process
provides systematized ratings, outlining specific behaviors and symptoms that need to be
addressed using a standardized format. These standardized formats, however, have
limitations. The structured interviews have been criticized for their inflexibility and lack
of depth in obtaining information (Aklin & Turner, 2006). Semistructured interviews
offer more latitude to the clinician for the diagnosis of patient symptoms. Although the
semistructured interviews alone may be less reliable due to the individual style of the
clinician, the qualitative and quantitative analysis found that the combination of the
semistructured interview and the use of empirical screening measures, may yield the
deepest understanding of the client’s situation (Aklin & Turner, 2006; Findling et al.,
2001; Findling et al., 2005).
The K-SADS is a semistructured diagnostic interview for children and
adolescents (6-18 years) designed to assess current and past episodes of psychopathology
according to DSM-IV-TR criteria. The K-SADS is administered by interviewing the
parent(s) and the child, and finally developing summary ratings which include all sources
of information (parent, child, school, chart, and other collateral sources).
The KSADS-PL+ and the CBCL have undergone considerable review and found
to have acceptable validity and reliability (Achenbach System of Empirically Based
Assessment [ABESA], 2006; Achenbach, 1991; Ambrosini, 2000; Ghanizadeh et al.,
UTILITY OF THE BGII 39
2006). However, they are very different instruments (ABESA, 2006; Aklin & Turner,
2006; Chambers et al., 1985; Findling et al., 2001; Findling et al., 2002). The KSADS is
a research derived instrument with limited use in clinical practice due to the cost and
administrative time demands (Chambers et al., 1985). The CBCL, on the other hand, is
both time and cost efficient. It consists of a behavioral checklist of distressing behaviors
to the responding individual. It is generally used in clinical practice as a screening
instrument (ASEBA, 2006).
The K-SADS is a semistructured interview that was developed in the late 1970s
by Drs. Puig-Antich and Chambers (Ambrosini, 2000). The original version of the
KSADS has gone through several modifications. The K-SADS PL+ includes the present
and lifetime edition of the KSADS along with the Youth Mania Rating Scale questions to
cover all possible symptoms of bipolar disorder (Findling et al., 2005; Gracious,
Youngstrom, Findling, & Calabrese, 2002). The K-SADS PL+ is DSM III-R (1987) and
DSM-IV (1994) compatible and includes interviews with both the parent and child. The
administration of the KSADS-PL+ takes approximately 90 minutes for each interview
and can take longer depending on the number of supplemental diagnostic areas that are
deemed significant (Ambrosini, 2000; Findling et al., 2001; Findling et al., 2005). The
KSADS, as well as the KSADS-PL+, allows for clinical judgment based on the research
diagnostic criteria and the placement of a specific symptom within the diagnostic
criterion (Ambrosini, 2000, Findling et al., 2001, Findling et al., 2005).
UTILITY OF THE BGII 40
The Wash-U-KSADS-PL+ interview is based on DSM –IV-TR diagnostic criteria,
and as previously discussed in regard to the DSM-IV-TR, there can be considerable
overlap across diagnoses and different clinician interpretation of a symptom. Using the
KSADS, KSADS PL+, or the WASH-U-KSADS-PL+ versions, can all be an arduous
process for the clinician, because training for test administration focuses on a clear
understanding and interpretation of various symptoms, administering the test must adhere
to a strict protocol, and learning to interpret the interviewee’s responses are all difficult
tasks (Findling et al., 2001, 2005; Youngstrom et al., 2002). WASH-U-KSADS PL+
training requires both a didactic and interview-observer component in order to teach the
interview process and insure scoring reliability (Ambrosini, 2000; Geller et al., 2001).
The K-SADS PL+ is administered by interviewing the parent and the child and finally
developing summary ratings, which include all sources of information. At some current
research sites, all the raters then meet for a consensus of the different summary ratings to
determine clinical diagnoses; this is known as the longitudinal expert evaluation of all
available data or LEAD (Spitzer, 1983).
The LEAD assessment involves a formal review of all the data by an expert or
team of experts (Klein et al., 1994; Pilkonis et al., 1991). LEAD is implemented
following interviews and includes a final review of all presenting data under the direction
of a licensed child psychologist at one of the current sites (Youngstrom et al., 2005).
LEAD may be implemented in person, by telephone, or video conferencing with various
raters reviewing the results of the KASDS-PL+. The final diagnoses are determined
UTILITY OF THE BGII 41
based on percentage of certainty by trained raters (Youngstrom, 2005). Although raters
are highly trained, which enhances scoring reliability, the interviews are very time
consuming, so the KSADS PL+ with LEAD, is not likely to be used in most clinical
practices. It is, however, used for research purposes and compared with commonly used
assessment measures (Findling et al., 2002; Findling et al., 2005; Youngstrom et al.,
2005). The WASH-U- KSADS PL+ with LEAD is a thorough and lengthy interview. It
can be useful in achieving accurate diagnoses using the DSM-IV-TR criteria (Ghanizadeh
et al., 2006). As previously stated, it is, however, impractical for the practicing clinician
due to its labor intensive administration and low cost efficiency.
The Child Behavior Checklist
The Child Behavior Checklist (CBCL) by Achenbach and Rescorla (2001) is a
commonly administered checklist given to both parent and child to disclose current
symptomology (Achenbach & Rescorla, 2001, ASEBA, 2006). It assesses 120 emotional,
a behavioral, social and psychological problem frequently reported by parents; and is
meant to be part of multi-informant instruments of empirically based assessments
originally developed by Achenbach (1991) and later revised by Achenbach and Rescorla
(2001). It assesses a broad range of behavioral symptoms found among children with
emotional difficulties (Achenbach, 1991). The revised CBCL resulted in the development
of two different formats: The CBCL for ages 1 to 5 years (CBCL/1 -5) and CBCL for
ages 6 to 18 years (CBCL/6-18) (Achenbach & Rescorla, 2001, ASEBA, 2006). The
Achenbach behavioral checklists are multiinformant instruments and have three different
UTILITY OF THE BGII 42
formats which include: parent, teacher, and self report forms. The Achenbach Youth Self
Report Form is a standardized forced response test designed for 11 to 18 year olds to
report their own strengths and areas of difficulties. Rater answers are based on accuracy
of each item currently or within the previous 6-month period (ASEBA, 2006). The CBCL
provides t scores and percentiles for three competency scales (Activities, Social, School),
total competency in these areas, internalizing problems scale, externalizing problems
scale, and total problems scale. There are also eight syndromes that can be differentiated
on the CBCL: aggressive behavior, anxiety /depression, attention problems, rule-
breaking behaviors, social problems, somatic complaints, attention deficit hyperactivity
problems, oppositional defiant problems and conduct problems (ASEBA, 2006). The
CBCL was intended to serve as one component of a multimethod empirically based
assessment and record children’s competencies and deficiencies as reported by their
parents, parent surrogates, or teachers. The CBCL provides clinical information relating
to strengths and competencies as well as problems within the individual (internalizing or
externalizing problems).
The CBCL had been found to be useful in assessing child symptomology based
on clinical levels of internalizing problems and/or externalizing problems, and
distinguishing attention deficit disorder from other disorders (Ivanova et al., 2007,
Lengua, Sadowski, Friedrich, & Fisher, 2001). In research by Ivanova and others, the
CBCL was found to useful be across cultures using the eight-syndrome structure across
30 diverse societies throughout the world (Ivanova et al., 2007). However the CBCL,
UTILITY OF THE BGII 43
similar to the DSM-IV-TR, has been criticized for its overlapping of diagnoses and a
tendency to over-estimate the co-occurrence of diagnoses (Lengua et al., 2001).
One study found the CBCL to be the most commonly utilized behavioral checklist
among clinical and school psychologists for use with children (Cashel, 2002). In recent
years, behavioral checklists and rating scales have replaced personality and projective
measures as the instrument of choice among psychologists (Kamphaus et al., 2000). The
popularity of behavioral checklists has been found to be related to their time efficiency,
straightforwardness and ease in quantifying the results (Kamphaus et al., 2000). However
as with any testing method, there are limitations. The self-report measures may raise
questions of honesty and defensiveness of the reporter (La Fiosca & Loyd, 1986).
Validity of self-reports has been found to improve when there are different sources of
measurement. The reports often include parent, child, and teacher versions when
screening (Achenbach & Rescorla, 2001). This researcher supports the use of behavioral
checklists to aid in diagnostic decision making and finds the CBCL is an adequate
assessment tool to use for children within this current study. Although the Youth Self
Report (YSR) by Achenbach (1991) would have been an ideal comparison assessment,
because it is geared for children older than 11, it was not used as a comparison measure
due to the younger aged children assessed.
As described in this section the OBG has been used for many years and previous
research into the BGII lays the ground work for the current study examining the utility of
the BGII and the previously discussed scoring methods. This current study also supports
UTILITY OF THE BGII 44
the assessment tools of the WASH-U-KSADS-PL+ and the CBCL as diagnostic decision
making tools. How these instruments are used in the current study will be further
addressed within the procedure section of the next chapter.
UTILITY OF THE BGII 45
Chapter 3: Method
This study is comparative research using archival data from a 1-year subset of a
larger 5-year study that included over 600 participants. This parent study aimed at
investigating the effectiveness of screening measures in diagnosing bipolar disorder in
children and adolescents. For my study, a total of 115 children were initially evaluated
and 75 completed all the procedures of the research protocol comprising the KSADS-
PL+, the CBCL, and the Bender Visual Motor Gestalt Test II. The data for all subsequent
analyses are derived from these 75 participants, as will be clarified below.
Participants
Inclusion criteria for this research as well as the larger general study were (a)
youth between the ages of 5 years and 18 years, (b) written consent and assent from both
caregiver and client, (c) both caregiver and youth presented for the assessment, and (d)
both caregiver and youth were functional English speakers. Participants for the larger
study were 620 caregivers and youth invited from the intakes of a community mental-
health center in Cleveland, Ohio, following a consecutive case-series design. A
consecutive case series design reports the outcomes of a group of individuals or clients
with a similar condition treated in the same manner (McKeon, Medina, & Hertel, 2006).
The parent study enrolled participants between July 2003 and March 2008, while my
research, which focuses on the BGII as a screening instrument, covered a period of time
from May 2007 through March 2008.
As noted my study reports findings for the 75 youth, ages 5 to 18, with complete
data for all measures. The most common reason for missing data was no Koppitz scoring
UTILITY OF THE BGII 46
for the youth (n = 26) followed by a missing CBCL (n = 7). There were seven other
individuals whose protocols were missing a variety of data points. Of the 75 youth
completing all elements of the study the average age was 11.92 (SD = 2.56), with a
median of three diagnoses, 64% male, 84% African American, 11% Caucasian, and 5%
other. Participants enrolled in the larger study over the 5-year time period between ages
5-18 were on average 12.00 years old (SD = 2.67), with a median of two diagnoses, 89%
African American, 7% Caucasian, and 4% other. All subsequent data points are derived
using an n = 75. Independent sample t test using a t test for the equality of means indicate
that youth in the current study were not significantly different in age or comorbidity than
other youth in the larger study t (74) =.817, p = .076. Chi-square analyses indicated no
significant difference between those who received the BGII and those who did not in
ethnicity, (3) = 2.86, p = .41. Chi-square analysis indicated no significant difference
between those who received the BGII and those who did not in gender, (1) = 1.67, p =
.20.
Measures
The Bender Visual Motor Gestalt II- Global Scoring System. This measure
was designed to assess visual motor integration across the lifespan and aid in
differentiating learning problems as well as psychological and neurological problems
(Brannigan & Decker, 2003). The BGII consists of the original Bender drawings plus
additional drawings added by Brannigan and Decker (2003) to expand testable age range
and provide more discrimination for levels of visual motor perception. Compared to the
OBG, the BGII includes new items (k =16) on the BGII compared to (k = 9) on the OBG,
UTILITY OF THE BGII 47
a memory recall phase, national norms, clinical validity studies, time estimates,
quantitative and qualitative scoring, test observation forms, and co-norming with the
Stanford-Binet (Brannigan & Decker,2003). Both the original and new figures were
based upon laws of perception postulated by Wertheimer (1923). Figures appear
individually on numbered cards. Administration of the BGII consists of two phases: copy
and recall. The copy phase measures visual motor integration. During the copy phase,
cards with a single figure are shown to the individual client one at a time. Participants
then copy each figure onto a blank paper with a pencil. Immediately after the copy phase,
the recall phase occurs. The recall phase measures short-term memory. The individual
draws as many of the figures as he or she can remember on a new blank piece of paper
immediately following the copy phase.
For the BGII, the Global Scoring System (BGSS) replaced the plethora of
alternate scoring systems that had been used for the OBG (Brannigan & Decker, 2003).
The BGSS is a measure to assess the accuracy of the BGII drawings. Scoring is based on
clinician comparison of examinee drawings to examples in the manual (Brannigan,
Decker, & Madsen, 2004). Each figure is scored on a 5-point Likert-type rating scale (0
to 4). A 0 means the drawn figure did not represent the stimulus at all. A score of 4
indicates the drawn figure matches the stimulus nearly perfectly. Drawings are rated from
both the copy and recall phases of administration using the BGSS. The BGSS produces a
sum raw score of all items administered. The raw scores are then converted to standard
scores based on similarly aged individuals within the norming sample (Brannigan &
Decker, 2003). Standard scores range from 40 to 160 (M = 100, SD = 15) (Brannigan &
UTILITY OF THE BGII 48
Decker, 2003). Classification ranges based on standard scores are represented by the
following: 145 to160, extremely high or extremely advanced; 130 to144, very high or
very advanced; 120 to 129, high or advanced; 110 to 119, high average; 90 to 109,
average; 80-89, low average; 70 to 79, borderline delayed; 55 to 69, mildly delayed or
low; and 40 to 54, moderately delayed or extremely low (Brannigan & Decker, 2003).
Validity and reliability of the BGSS were originally discussed by Brannigan and
Decker (2003) during the test revision process. Construct validity of the BGSS is
assessed by comparison with the Beery-Buktenica Developmental Test of Visual Motor
Integration, Fourth Edition (VMI-IV). The BGSS standard scores demonstrate strong,
positive correlations with the VMI-IV on both copy (r = .65) and recall (r = .44)
(Brannigan, Decker, & Madsen, 2004). Additionally, the BGSS was compared to tests of
achievement and cognitive ability. The BGSS has demonstrated modest correlations with
the Woodcock –Johnson II Tests of Achievement (Woodcock, McGrew, & Mather,
2001). The BGSS shows modest correlational scores ranging from .27 to .53 for the Copy
Phase, and .25 to .49 in the Recall phase (Brannigan & Decker, 2003). A study examining
the relationship between the BGSS and the Stanford Binet Intelligence Scales, Fifth
Edition (Roid, 2003) corrected correlations between the BGSS and the SB-5 IQ scores
ranged from .50 to .54 for the Copy phase and .45 to .48 for the Recall phase. Interrater
reliability of the BGSS was extremely high, yielding a Kappa = 0.90, p < 0.05. Another
measure of reliability of the BGSS was found using the split-half method for internal
consistency; Spearman-Brown prophecy formula was > 0.91, at all age ranges indicating
a consistent and stable measurement (Brannigan & Decker, 2003). As reported in the
UTILITY OF THE BGII 49
manual, the BGSS demonstrates adequate validity and reliability for measuring visual-
motor integration (Brannigan & Decker, 2003).
In addition to the BGSS, other researchers have used the revised Koppitz 2 by
Reynolds (2007) to score the figures of the BGII. For this research, Koppitz 2 scoring
occurs after administration of the BGII and BGSS scoring. The Koppitz 2 Emotional
Indicators score was incorporated and scored along with the Koppitz 2 Visual Motor
Integration scoring. The Koppitz 2 is a scoring system used to derive assessment data
from the Bender Gestalt II (2003). It provides an overall visual-motor integration score
that shows the ability to relate visual stimuli to motor responses in an accurate and
appropriate manner (Reynolds, 2007). The Koppitz 2 Visual Motor Index uses an age
corrected deviation scaled score set (M = 100, SD = 15). As previously mentioned, the
categorical descriptive ratings are identical to those of the Bender Visual Motor Gestalt
Test II- Global Scoring system results. Koppitz found significant differences between
children with learning or behavioral difficulties and those who demonstrated average
performance (Brannigan & Decker, 2003). Koppitz recognized excessive time taken for
test completion as a possible indicator of psychopathology. The time to normally
complete the drawings has been examined in previous research and found to be between a
mean of 9 minutes to 14 minutes depending on age for the copy phase with standard
deviations of 4 to 7 minutes on both the BGSS and the Koppitz 2 (Brannigan & Decker,
2003; Reynolds, 2007). Time is often a reported variable in the testing interpretation and
has been associated with number of errors and willingness to perform the test accurately.
UTILITY OF THE BGII 50
Brannigan and Decker (2003) conducted studies during their research
standardization of the BGSS to investigate the relationship between that system and the
Koppitz 2. Correlations with that system and the BGSS were .80 for the copy phase and
.51 for the recall phase. The lowest Cronbach’s alpha of the Koppitz 2 as compared to the
BGSS based on age ranges was found in the 5-year age group at 0.77. In other aspects of
the norming study of the BGSS, age, gender and race variables were considered. When
divided into gender and racial groups, the Cronbach’s coefficients on the BGSS were
found to be approximately 0.90 for ages 8 and above (Brannigan & Decker, 2003). These
were calculated as internal measures of the BGSS during the norming process.
Test- retest method of reliability over time shows an average correlational
coefficient, r = 0 .77 with a range of 0.73 to 0.85 in the normative sample. Interrater
reliability was found to be Cohen’s kappa = 0.91 and 0.93 for the two different protocols
of the Koppitz 2 (Reynolds, 2007). The correlation coefficients for the original Koppitz
Scoring System and BGSS were 0.80 for the Copy phase and 0.51 for the Recall phase
(Brannigan & Decker, 2003). It appears that the scoring systems are stable and measuring
similar constructs, but not to the point of redundancy.
Koppitz 2 scoring is based on specific aspects in the different designs that are
considered indicators for the presence or absence of perceptual difficulties. Tests are
scored on a point system based on examiner affirmative responses to the representation of
a specific item. (For example, for Design 5, 1 point = yes, the two items touch or nearly
touch). There is a possibility of 45 correct in the Koppitz 2 raw score. Raw scores are
then converted to standard scores and are the basis for the Visual Motor Integration score.
UTILITY OF THE BGII 51
The Koppitz 2 extracts a further score from the Bender drawings which is known
as the Koppitz 2 Emotional Indicators (EI). This score is based on the use of the BGII as
a projective indicator to identify children with severe emotional problems. This usage
derived from research data collected by Koppitz and others over many years (Reynolds,
2007). Koppitz identified 12 emotional indicators that she believed were of considerable
significance to those children with severe emotional problems and not a reflection of
intelligence alone (Reynolds, 2007). Those 12 emotional indicators (EI) are (a) confused
order, (b) wavy line, (c) dashes for circles, (d) progressive increase in drawing size, (e)
large size of drawings, (f) small size of drawings, (g) fine lines, (h) overworked or
reinforced lines, (i) second attempts at drawing a design, (j) expansion, (k) box around a
design, and (l) spontaneous elaboration or additions to the design (Reynolds, 2007). In
developing the Koppitz 2, Reynolds added the EI as a supplemental test based on the
original Koppitz research and results (Koppitz , 1963, 1971, 1975).
The Kiddie Schedule for Affective Disorders and Schizophrenia for School-Age
Children (K-SADS). This measure is a semistructured interview that was developed by
Puig-Antich and Chambers (1978). The KSADS allows the examiner to systematically
inquire about symptoms of psychopathology in children and adolescents. Kaufman and
others modified the original KSADS so that it also inquires about present and past
episodes of psychopathology, becoming the KSADS-PL (Kaufman, Birmaher, & Brent,
2003). This version demonstrates adequate reliability and validity as a clinical diagnostic
measure (Kaufman et al., 2003). The Washington University KSADS (WASH-U-
KSADS) version was further modified to include additional symptoms and associated
UTILITY OF THE BGII 52
features of depression and mania (Geller et al 2001). The WASH-U-KSADS-PL +
combines the KSADS–PL and the WASH-U-KSADS. Additionally the WASH-U-
KSADS-PL+ includes the Child Depression Rating Scale (Geller et al., 2001) and the
Young Mania Rating Scale (Gracious et al., 2002). The WASH-U-KSADS-PL+ was the
modification used in the current study and considered the reference standard in childhood
bipolar research. In the current study, interrater reliability was established by training all
research assistants to attain a Kappa coefficient > 0.85 at the symptom level and Kappa =
1.0 at the diagnosis level. To do so, new research assistants were initially required to
score five administrations with a certified rater followed by five interview
administrations leading and scoring the interview with participation of a certified rater.
Passing was scored at an item level Kappa > 0.85.
Final diagnoses were given via the Longitudinal Evaluation of all Available Data.
(LEAD) process (Spitzer, 1983). The process involves a formal review of all data by an
expert or team of experts. Raters were blind to the final diagnoses for study the research
questionnaires and BGSS results, but were later integrated with family history, prior
treatment history, prior testing history, and other clinical observations under the
supervision of licensed clinical psychologists. LEAD was conducted in person, by
telephone conferencing, or video conferencing with all raters reviewing the results of the
WASH-U-KASDS-PL+, along with child history, family history, supplemental
information, and other screening results. Final diagnoses were then determined based on
consensus of certainty by the raters.
UTILITY OF THE BGII 53
The Child Behavior Checklists, (CBCL) and Youth Self Report Form (YSR).
These measures by Achenbach and Rescorla (2001) are rating scales of common Axis I
psychopathology as rated by caregivers and self-report. The CBCL and the YSR assess a
broad range of behavioral symptoms found among children with emotional difficulties
(Achenbach, 1991). The CBCL and YSR were developed to serve as components of a
multiinformant empirically based assessment. The CBCL was revised in 2001 and
resulted in two different forms, the CBCL for ages 1 to 5 (CBCL/1 -5), and the CBCL
for ages 6 to 18 (CBCL/6-18; ASEBA, 2006). Caregivers and youth respond to each item
using the past 6 months as the time frame. Responses consist of a 3-point Likert-type
scale (0 to 2), with 0 indicating never true, 1 indicating sometimes true, and 2 indicating
always true. Raw sum scores are transformed to T scores (M = 50, SD = 10) (Rescorla &
Wagner, 2001). The CBCL and YSR yield eight syndrome scales and three general
scales. The eight syndrome scales are aggressive behavior, anxiety/depression, attention
problems, rule-breaking behaviors, social problems, somatic complaints, attention-deficit
hyperactivity problems, oppositional-defiant problems and conduct problems (ASEBA,
2006). The three general scales are internalizing, externalizing, and total problems. High
T scores on the problem scales are indicative of pathology. T scores higher than 70 are in
the clinical range, T scores of 64 to 70 are in the borderline clinical range, while less than
64 are in the healthy range (Achenbach & Rescorla, 2001). While the CBCL and YSR are
widely used and considered valid measures of psychopathology, some critique their use
for having limited response choices and the inability of the clinician to query responses,
thus limiting the amount of clinical information gathered (Barker & Pistrang, 2004).
UTILITY OF THE BGII 54
Examiners
The researchers were predominantly predoctoral interns participating as part of
American Psychiatric Association predoctoral internship research requirements. Five
research assistants were full-time employees who had undergone extensive training. All
research assistants were under the direct supervision of the principal investigator and
coinvestigators. The raters administered the protocols independently once deemed
reliably trained. The reliability of raters was established over 10 passed trials with a
veteran examiner until a 0.85 coefficient Kappa interrater reliability was obtained
(Findling et al., 2001, 2002, 2005; Youngstrom et al., 2001; Youngstrom et al., 2005).
Procedure
All youth and caregivers presenting for a mental-health intake were invited to
participate in a more detailed assessment during the clinical intake session for the general
clinic. If youth and caregivers agreed to participate (62% agreed; primary reason for not
participating was duration of the study), they were then scheduled to meet with a research
assistant (57% attended). Institutional review of this study was conducted under board
approval from the University Hospitals of Cleveland and the Case Western Reserve
University Institutional Review Board. The West Virginia University Institutional
Review Board approval was sought; the study was granted exempt status because of the
use of archival data (Appendix A). Both caregivers and youth provided written and verbal
consent.
The WASH-U-KSADS-PL+ was conducted individually and separately with the
youth and caregivers. The youth completed all other measures with a separate research
UTILITY OF THE BGII 55
assistant while the caregiver was being interviewed. The caregiver completed all other
measures with another research assistant while the participant was being interviewed. The
BGII drawings were scored according to the BGSS and Koppitz 2 by both the research
assistant who administered the measure and by a research assistant blind to the interview
administration and testing results. No significant difficulties were reported by research
assistants during administration of the BGII. Youth completed the copy phase of the BGII
in 8 minutes and 32 seconds on average (SD = 4.3). As previously reported, the normal
time to complete the drawings in previous research has been found to be between a mean
of 9 to 14 minutes depending on age for the copy phase with standard deviations of 4 to 7
minutes on both the BGSS and the Koppitz 2 for interpretation of results in relation to
errors (Brannigan & Decker, 2003; Reynolds, 2007).
Data Analyses
Data from the ABACB study (Youngstrom, 2006) were analyzed in SPSS v. 16.0.
Data analyses were performed on the BGII GSS scores, BGII Koppitz 2 scores, the
CBCL syndrome and general scales, the WASH-U-KSADS+PL and LEAD results. The
CBCL syndrome and general scales were correlated using Pearson correlations with the
BGSS scores and Koppitz 2 scores to examine whether similar constructs were being
measured. The BGSS scores and the Koppitz 2 scores were compared to diagnoses using
independent sample t tests, ANOVAs, and Receiver Operating Characteristic (ROC).
Diagnoses were categorized into four hierarchical groups: Bipolar Spectrum Disorders,
Depressive Disorder, Behavior Disorders, and Other Diagnoses. For example, if a youth
UTILITY OF THE BGII 56
was diagnosed with Bipolar II and ADHD- Combined type, then he/she would be placed
in the Bipolar Spectrum Disorders group (Youngstrom et al., 2001).
Significant ANOVA results were further explored using post hoc tests; Tukey’s
HSD was used if variances among groups could be considered equal. Games-Howell was
used if variances among groups could not be considered equal. Holm’s step-down
correction procedure was used to compare independent t tests with hierarchical diagnosis
groups.
The evaluator could then assess the BGSS and Koppitz 2 level of specificity and
sensitivity with regard to specific measurement results as well as common childhood
diagnoses by testing different cut-off scores related to those significant ANOVA findings.
If any of the hypothesized results had been found, they could have demonstrated the
utility of the Bender in diagnosis in evidence-based practice (Youngstrom & Duax,
2005). The standard of using significant ANOVA findings in evidenced-based practice
had already resulted in the development of Receiver Operating Characteristic (ROC)
analyses to examine the diagnostic sensitivity and specificity of the assessment measures
(Altman & Bland, 1994b; Frazier, 2006; Jaeschke et al., 1994; Youngstrom, 2006).
The ROC analysis plots sensitivity and false alarm rate (1-specificity) to aid in the
interpretation of scores and can separate cases from those identified as noncases.
Sensitivity is the proportion of individuals identified by the test as positive for the
diagnosis that had been previously positively classified for that diagnosis (Altman &
Bland, 1994a; Choi, 1998). A false alarm is the belief that something is positive but it is
not; therefore an incorrect diagnosis may be assumed when the child does not actually
UTILITY OF THE BGII 57
have the disorder. Specificity is the proportion of individuals identified correctly without
the diagnosis (Altman & Bland, 1994a; Choi, 1998). An ROC plots the sensitivity/false
alarm rate for each score thus producing a curve. A straight line indicates a 50/50 chance
of being given the specific diagnosis and is considered random. Tests that do not
discriminate above the level of random chance are of no use to clinicians (Zweig &
Campbell, 1993). ROC curves examined diagnostic efficiency by comparing the
sensitivity and false alarm rate (1 – specificity) for each score (Altman & Bland, 1994a).
ROC results are determined by nonparametric methods that result in a decimal fraction
that represents the area under the curve (AUROC). An AUROC of .50 would indicate
chance performance.
In evidence-based practice, clinical decision making uses the diagnostic
likelihood ratios derived from ROC analysis to improve the likelihood of a clinically
significant diagnosis. Mathematically, this is a result of the Bayes’ Theorem. Visually a
nomogram can be used to simplify the process in clinical practice (Frazier, 2006;
Jaeschke et al., 1994; Youngstrom, 2006). The use of diagnostic likelihood ratios, ROC,
and nomograms are relatively new to the field of psychology yet have been used in
determining the utility of medical procedures and tests for years (Frazier, 2006).
Diagnostic likelihood ratios were calculated to aid in understanding the clinical
significance. Using Bayes’ Theorem, the diagnostic likelihood ratio is multiplied by
pretest odds (or base rate) to determine the posttest odds. For ease of clinical use, the
odds are presented as probabilities in percent form. If one uses a binary cut point (e.g.,
UTILITY OF THE BGII 58
clustering all clients below a score and all others above a score), then the positive
diagnostic likelihood ratio is sensitivity divided by 1- specificity for the cut point score.
In the current research and prior to statistical analysis, a power analysis was
conducted using G Power (Erdfelder, Lang, & Buchner, 1996). Power of 80% was
calculated a priori for the correlational results. A moderate correctional effect size was
used (r = .30). Results indicated that a sample of 67 clients was required for statistical
utility. However, power of 80% was calculated a priori for the difference between group
results using independent sample t tests with a moderate effect size (Cohen’s d = 0.5).
Results indicated a sample of more than 67 clients was sufficient and therefore an N of 75
or larger was sufficient.
This chapter has detailed the methods and procedures used in this study along
with the rationale for these methods. As stated the data were gathered for approximately
one-year in the duration of a larger 5-year study. The study was designed to fit into the
framework of the larger study and the data collected was archival.
UTILITY OF THE BGII 59
Chapter 4: Results
The present study was conducted to examine the clinical utility of the Bender
Gestalt Visual Motor Test II (BGII) as a diagnostic discriminator for young children and
adolescents. Two different methods for scoring the BGII, the Global Scoring System
(BGSS) and the Koppitz 2 Scoring System, were compared to determine whether the
BGII is an adequate measure for identifying pathology in children at initial intake.
Participants in the present study were administered the BGII during an interview process
to assess clinical diagnostic accuracy among a variety of screening instruments. All data
were converted to T scores with a mean of 50 and standard deviation of 10 for analysis.
Data from the current study were analyzed in SPSS v. 16.0. Data analyses were
performed on the BGII GSS scores, BGII Koppitz 2 scores, the CBCL syndrome and
general scales, the WASH-U-KSADS+PL and LEAD results. The CBCL syndrome and
general scales were compared with the BGSS scores and Koppitz 2 scores using Pearson
correlations to determine whether similar constructs were being measured. The BGSS
scores and the Koppitz 2 scores were then compared to diagnoses using independent
samples t tests, to determine whether similar constructs were being measured. The BGSS
scores and the Koppitz 2 scores were then compared to diagnoses using independent
samples t tests, ANOVAs, and Receiver Operating Characteristic (ROC).
Research Question 1
When compared to a common screening instrument such as the CBCL, the BGII
using the BGSS and Koppitz 2 scoring methods will be useful in the diagnostic screening
UTILITY OF THE BGII 60
of a clinically referred population of children. Results of the BGSS scores are given first
followed by results of the Koppitz 2 scoring. The BGSS scoring of the BGII was
compared to the CBCL using Pearson correlations. The BGSS Copy T scores were not
found to be significantly related to the CBCL Internalizing, Externalizing, and Total
Problems T scores. The Bender GSS Recall T scores, (r = -0.26, p < 0.01, r² = .07) have
a negative relationship to the CBCL Total Problems T scores. Therefore, an increase in
the total number of behavior problems was associated with a decrease in the recall scores.
The BGSS Recall T scores demonstrated a significantly negative relationship to the
CBCL Externalizing T scores (r = -0.23, p = 0.01, = 0.05). This indicates that as scores
on externalizing or acting-out behaviors increased child recall scores decreased. The
CBCL Internalizing T scores were found not to be significantly related to the BGSS Copy
T scores (r = 0.002, p = 0.98) or to the BGSS Recall T scores (r = -0.08, p = 0.36). The
Internalizing and Externalizing CBCL scores were significantly correlated to each other(r
= 0.44, p > 0.01, = 0.19). Not surprisingly, both the Internalizing and Externalizing
scores were highly correlated with the Total Problems T scores (r = 0.76, r = 0.89, p >
0.01, respectively). Table 1 displays the complete correlation matrix for the CBCL scores
and the BGII using the BGSS scoring system.
Results of the CBCL on the larger study indicate that the clinically referred
groups of children were in the clinically borderline range for emotional problems: CBCL
Total Problems (M = 68.74, SD = 8.82, N = 782), CBCL Externalizing Behaviors (M =
69.63, SD = 9.72, N = 782), and CBCL Internalizing Behaviors (M = 63.48, SD = 10.34,
N = 782, in the larger study). This is between one and two standard deviations above the
UTILITY OF THE BGII 61
mean on the CBCL based on the larger 5-year study results. This is indicative of high
number of symptoms as reported by the parent or primary caregiver at the time of initial
intake.
As the sample group, the BGII, Copy T scores, when adjusted for age were (M =
52.10, SD = 10.85, n=75), and the Recall T scores were (M = 50.98, SD = 11.83, n =75).
These results are in the average range for the BGSS and are not considered clinically
significant. The overall mean differences between the CBCL and BGII are indicative of a
significant difference between the measures. The mean scores between the CBCL and
BGSS scores were compared using a dependent sample t test as they both provide T
scores and are comparable one to the other. The results suggest that the BGSS indicates
on average less visual motor problems in the sample than the CBCL indicates
psychopathology in the sample. However, it is important to note that the BGII was
completed by the child (client) and the CBCL was completed by parent or primary
caretaker.
Pearson correlation coefficients were computed between the BGSS Copy and
BGSS Recall T scores and the CBCL. The CBCL scores and results are presented in
Table 1. The BGSS Copy T score has a moderately high positive relationship with the
BGSS Recall T score, (r = 0.49, p < .01). In Table 1, BGSS Recall scores were
moderately negatively related to the CBCL Externalizing scores, (r = -0.23, p < .01) and
Total Problems scores, (r = -0.26, p < .01). These results suggest a negative relationship
between client’s ability to recall drawings and number of externalizing behaviors
reported by their caregivers as well as total number of symptoms endorsed.
UTILITY OF THE BGII 62
Overall the sample group and the time to complete the BGII show a moderate
positive association with BGII GSS Copy T scores, r =.34, p <.05. Time to complete the
BGII was not significantly related to any of the CBCL T Scores, all p < .05.
Table 1
Pearson Product-Moment Correlation Matrix Between the BGSS Copy and Recall Scores
and the CBCL Main Scores
Copy
Recall
CBCL
internalize
CBCL
externalize
CBCL
total problems
BGSS copy -
BGSS recall
.49
---
CBCL
internalizing
.01 -.09
CBCL
externalizing
-.16 -.23* .44**
CBCL Total
problems
-.16 -.26** .76** .86** ---
*p > .05, **p > .01
Further analyses of the BGSS and CBCL were performed using the syndrome
scales of the CBCL; Table 2 displays the complete correlation matrix for the CBCL
syndrome scales and the BGSS scores. The CBCL syndrome categories were (a)
aggressive behavior, (b) anxiety/depression, (c) attention problems, (d) rule-breaking
behaviors, (e) social problems, (f) somatic complaints, (g) thought problems, and (h)
withdrawn depressed.
UTILITY OF THE BGII 63
The CBCL attention problems scale was found to be negatively related to the
BGSS copy scores, (r = -0.14) and negatively related to BGSS recall scores (r = -.24, p =
.01). The strength of the association of the relationship between CBCL attention
problems and Bender Copy scores was = 0.42. The BGSS recall score and the CBCL
thought problems syndrome category were positively related, (r = -.26, p =. 01);
suggesting a relationship between the ability of the child to recall the drawings and
reporting thinking difficulties or reality based behaviors.
The Bender Recall scores were found to correlate to several of the subcategories
of the CBCL. The Bender Recall scores were found to have a weak negative correlation
with the subcategories of anxiety-depression, social problems, thought problems,
attention problems, and rule breaking, all p < .05. Again this suggests a closer connection
between BGSS recall scores and symptom scores than the copy phase of the BGSS.
Further analyses of the BGSS method of scoring with the CBCL were conducted
on the clinical diagnoses scales. Results revealed that BGSS copy scores were not found
to be significant for affective disorder, anxiety disorder, ADHD or oppositional defiant
disorder. Again however, the BGSS recall scores were found to be significant in those
children who met the diagnoses of ADHD using the CBCL findings, (r = .19, p = .05).
This suggests that as BGSS scores increase so do ADHD subscale scores. In other words,
poor performance on the recall is associated with higher ADHD scores. Findings of
BGSS compared to the CBCL clinical diagnoses are found in Table 3.
UTILITY OF THE BGII 64
Table 2
Pearson Correlational Matrix of the BGSS Scores and the CBCL Syndrome Scores
Subcategory
GSS
copy
GSS
recall
CBCL
internalize
CBCL
externalize
CBCL
total
problems
Anxiety/
depression
.07
-.13
.85**
.40**
.68**
Attention
problems
-.14
-.24**
.47**
.55**
.71**
Aggressive
behaviors
-.20*
-.18
.45**
.93
.82**
Social problems -.07
-.17
.61**
.55**
.75**
Rule breaking
.01
-.15
.33**
.86**
.71**
Thought problems
-.13
.26**
-.60**
.53**
-.74**
Withdrawn
behaviors
.09
.13
.77**
.31**
.56**
Somatic
concerns
.04
-.10
.69**
.26**
.50**
*p > .05, **p > .01
UTILITY OF THE BGII 65
Table 3
Pearson Correlational Matrix Comparing BGSS and CBCL Diagnostic Scores
Affective
disorder
Anxiety
disorder
ADHD
ODD
GSS copy
scores
-.09
-.17
-.18
-.07
GSS recall
scores
-.16
-.08
-.19*
.11
*p > .05
To determine whether the Koppitz 2 scoring of the BGII was associated with
psychopathology, Pearson correlational coefficients were computed between the Koppitz
2 and the CBCL results using their respective T scores to determine if there were any
relationships. The initial analysis compared the Koppitz 2 total VMI and total EI scores
with the CBCL scores for internalizing, externalizing and total problems (Table 4). These
findings indicated a weak but significant relationship between CBCL Total Problem and
Koppitz EI scores. No other significant relationships were found between the different
Koppitz and CBCL testing measures.
UTILITY OF THE BGII 66
Table 4
Pearson Correlational Matrix Comparing CBCL and Koppitz 2 Scores
CBCL
internalizing
CBCL
externalizing
CBCL
total problems
Koppitz EI .15
.06
.18*
Koppitz VMI -.08
-.16
-.19
*p > .05
Further analyses comparing the Koppitz 2 findings with the CBCL subcategories
resulted in the following findings as shown in Table 5. First, the Koppitz EI scores were
found to be related to the Koppitz Total Score (r = 0.48, p .05) and the Koppitz VMI
Scores (r = .31, p .05). Koppitz Total Scores were found to be correlated to the Koppitz
VMI Scores (r = 0.88, p .05).
Koppitz 2 Total Scores were also found to be correlated to social problems, (r = -
.23, p .05), and aggressive problems on the CBCL, (r = -.24, p .05), Koppitz 2 EI
scores were correlated with social problems, (r = 0.28, p .05), and attention problems,
(r = .28, p .05). The Koppitz 2 scoring methods were not found to be significantly
correlated with anxiety-depression, withdrawal, somatic complaints, thought problems,
rule-breaking group, internalizers, externalizers, or total problems groups. This suggests a
connection between the number of Koppitz 2 EI exhibited by the child and the number of
social and attention problems described by the caregiver on the CBCL. Koppitz 2 Total
Scores were strongly related (r = 0.88, p .05) as previously suspected.
UTILITY OF THE BGII 67
Table 5
Pearson Correlation Matrix Comparing Koppitz 2 Total Score, Total Emotional
Indicators, and Visual Motor Index to the CBCL Syndrome Scales
1 2 3 4 5 6 7 8 9 10 11 12 13
1.Koppitz
total score
-
2.Emotional
indicators
-.48* -
3. VMI .88
*
-.31* -
4.Anxious-
depressed
-.07 .19 .01 -
5.
Withdrawn
.22 -.10 .19 .50* -
6.Somatic
complaints
.11 -.01 .18 .43* .41* -
7.Social
problems
-
.23
*
.28* -.12 .60* .17 .35* -
8.Thought
problems
-.10 .08 -.09 .55* .20 .49* .61* -
9.
Attention
problems
-.22 .28* -.11 .47* .16 .31* .67* .62* -
10.Rule
breaking
-.09 .09 -.09 .37* .21 .34* .51* .51* .46* -
11.
Aggressive
-
.24
*
.19 -.20 .53* .11 .24* .75* .63* .65* .69* -
12.
Internalizing
.00 .09 .06 .83* .75* .69* .51* .51* .40* .38* .42* -
13. External-
izing
-.21 .15 -.17 .51* .16 .27* .69* .64* .63* .84* .95* .42
*
-
14.Total
problems
-.22 .22 -.14 .73* .37* .51* .79* .78* .74* .73* .85* .71
*
.89*
* p < .05
UTILITY OF THE BGII 68
Research Question 2
When compared to the research instrument of the WASH-U-KSADS-PL+, scores
derived from the BGII BGSS and the Koppitz- 2 scoring systems will be found to be
valid predictors of pathological symptom severity among the children studied. Again the
results section is ordered by the two systems used, BGSS first and Koppitz 2 second.
Results of the BGII BGSS were compared to the diagnostic categories of the
KSADS-PL+ using Pearson Product-Moment correlations. The current study compared
the different scoring results with specific pediatric diagnostic categories on the KSADS-
PL+ to determine if the BGII could be useful in identification of pathology. Although
when compared with the KSADS-PL+ diagnoses the BGSS results showed no significant
correlations (Table 6), additional analyses found that within the WASH-U-KSADS-PL+
semiinterview, the ADHD diagnosis was positively correlated with the ODD diagnosis (r
= .27, p = .01, = .07).
The Koppitz 2 scoring results were found to be associated with the diagnostic
findings on the KSADS-PL+. The KSADS-PL+ diagnoses were further compared to the
Koppitz 2 Subtotals, VMI and EI using the Pearson Correlational Coefficient analyses as
presented in Table 7. When comparing the Koppitz 2 VMI scoring results with that of the
KSADS-PL+ diagnoses, there were no significant correlations found at the p = .05 level.
However, when comparing the Koppitz 2 Emotional Indicators with different KSADS-
UTILITY OF THE BGII 69
PL+ diagnoses, there was a positive correlation between ADHD diagnoses and the EI
scores (r = 0.30, p <0.01) with a strength of association of= .08.
UTILITY OF THE BGII 70
Table 6
Pearson Correlation Matrix Comparing the BGII, BGSS and Koppitz2 Scores and
KSADS-PL+ Symptomology
________________________________________________________________________
1 2 3 4 5 6 7 8
1. Time to complete -
2. BGSS score .34* -
3. BGSS recall .20 .53* -
4. Koppitz2 total score .16 .72* .50 -
5. Emotional indicators .04 -.23* -.33* -.48 -
6. Koppitz2 VMI .25* .82* .51* .88* -.31* -
7. KSADS depression -.10 .02 .14 .14 -.19 .01 -
8. KSADS mania -.06 -.10 -.05 .04 -.18 -.07 .50* -
9. KSADS ODD -.15 -.04 -.21 .04 -.09 .04 -.03 .32
-
________________________________________________________________________
* p > .05, n= 75.
Table 7
Pearson Product-Moment Correlations of the Koppitz 2 Scores and KSADS-PL+
Diagnoses.
Bipolar /
KSADS
ADHD/
KSADS
ODD/
KSADS
CD disorder/
KSADS
Koppitz VMI
T score
-.05
-.12
-.11
.08
Koppitz EI
T score
-.09
.30**
.03
-.18
*p > .05, **p > .01
UTILITY OF THE BGII 71
Research Question 3
There will be significant relationships among the scoring results of the Bender
Visual Motor Gestalt II and Koppitz- 2 depending on diagnostic categories when related
to final LEAD results. Children with bipolar disorders will differ from those with
unipolar disorder, anxiety disorders, behavior disorders, and ADHD scores derived from
the BGSS or the Koppitz 2 scoring systems.
Using the Holm’s step-down correction procedure, independent samples t tests
were conducted to determine whether there was a significant difference on each of the
separate scales. The scales are (a) BGSS copy scores, (b) BGSS recall scores, (c) Koppitz
2 total score, (d)Total emotional indicators and (e) Koppitz 2 VMI between KSADS-PL+
diagnoses categories. The use of Holm’s step-down correction was preferred because it
maintains alpha at close to .05 overall and protects statistical power more than the
common Bonferroni Correction method. When using this approach, there are four
comparisons being made on each scale based on the following categories. The categories
are (a) bipolar disorder versus all others, (b) ADHD versus all others, (c) conduct
disorder versus all others and (d) ODD versus all others.
The BGSS Copy scores (t (73) = 1.05, p = .29), BGSS Recall Scores (t (73) =
0.07, p = .94), Koppitz 2 Total Score (t (73) = 0.32, p = .78), Koppitz 2 Emotional
Indicators (t (73) = 0.14, p = .10), and Koppitz 2 VMI (t (73) =1.09, p = .42) did not
significantly differentiate between those diagnosed with bipolar (n = 9) and all other
disorders (n = 66). ROC indicated the BGSS Copy score (AUROC = .40), BGSS Recall
UTILITY OF THE BGII 72
score (AUROC = .49), Koppitz 2 Total Score (AUROC = .47), Koppitz 2 Emotional
Indicators (AUROC = .36), and the Koppitz 2 VMI (AUROC = .41) did not predict
bipolar disorder significantly better than chance, at p > .05. The BGSS Copy score (t (73)
= 1.66, p = .10), BGSS Recall Scores (t (73) = 1.69, p = .10), Koppitz 2 Total EI (t (73) =
1.79, p = .07), and Koppitz 2 VMI (t (73) = 1.34, p = .19) did not significantly
differentiate between ADHD (n = 47) and all other disorders (n = 28). On the Koppitz 2
Total Score children and adolescents diagnosed with ADHD (M = 19.47, SD = 8.44) had
a significantly lower scores than all others (M = 24.93, SD = 9.49), (t (73) = 2.56, p =
.01). An error score was created for the Koppitz 2 Total Score. An error score was an
actual score subtracted from the total possible score. Receiver Operating Characteristic
(ROC) analyses were also conducted to determine whether any aspects of the researched
scoring systems could predict any specific diagnoses compared to those not having the
diagnosis in the sample group. This score was created to aid in the interpretation of the
ROC analysis because it moved the ROC curve to the conventional top left instead of the
bottom right of the diagram. Figure 1 displays the ROC curve that demonstrates that the
number of errors significantly predicted an ADHD diagnosis over all other diagnoses
(AUROC = .67) 95% CI = [.54, .80].
Using diagnostic likelihood ratios, sample participants who made more than 23
errors have a positive diagnostic likelihood ratio (DLR) of 2:1. Expressed as a
probability, an odds ratio of 2:1 is 66%. This is calculated by the following equation:
probability = odds/ (1 + odds); using the formula shown with the current data: 2/ (2 + 1)
UTILITY OF THE BGII 73
= .66. This indicates that in a sample in which 50% of the youth have a diagnosis of
ADHD, a positive test result increases the probability of the child truly having an ADHD
diagnosis to 66%, and represents the increased likelihood of an accurate diagnostic
finding when the prior odds are modified by incorporating the diagnostic likelihood ratio
as indicated by Bayes’ theorem to calculate posterior odds. Negative diagnostic
likelihood ratios can also be calculated by taking (1-sensitivity) divided by the specificity
for the cut point score. Clients making less than 23 errors have a negative diagnostic
likelihood ratio of 0.7. Again using the Bayes’ Theorem, prior or base rate odds (1.0) are
multiplied by the DLR (0.7) which equal posterior odds of 0.7, which convert
mathematically to 41% using the formula above .7/ (1 + .7) = .41. This indicates that in
the same sample of which 50% of the youth could have an ADHD diagnosis, a test result
of less than 23 decreases the probability of a child having the diagnosis of ADHD to
41%.
UTILITY OF THE BGII 74
ADHD.
*Note: The solid diagonal line is chance (AUROC=.5).
Figure 1. ROC of errors made using the Koppitz 2 scoring system to predict.
The BGSS Copy score (t (73) = .99, p = .99), BGSS Recall Score (t (73) =.06, p =
.47), Koppitz 2 Total Score (t (73) = -1.77, p = .08), Koppitz 2 Emotional Indicators (t
(73) = 1.82, p = .08) and Koppitz 2 VMI (t (73) = -.62, p = .53) did not significantly
differentiate between conduct disorder (n = 30) and all other disorders (n = 45). An
ANOVA was performed using the mean of the BGSS Recall Scores across the KSADS-
PL+ groupings of diagnoses found during LEAD. BGSS Recall Scores were found not to
be significantly different with F (3,104) = 0.76, p = 0.97. Due to a nonsignificant
ANOVA and low shared variance among the different measures, additional analyses were
not performed on this relationship.
UTILITY OF THE BGII 75
Using subscores from the Koppitz- 2 scoring systems of the Bender Visual Motor
Gestalt Test II were compared with the final LEAD diagnoses groups to determine if any
significant relationships exist. The ANOVA of the Koppitz -2 VMI -Scores were not
significant. An ANOVA was performed with the Koppitz 2 Total EI scores and were
found to be not significant. As previously shown in Table 5, the unipolar group had
reported fewer emotional indicators than the disruptive behavior group and, though not
significant, showed a difference between the cyclothymic/bipolar group and residual
group. The average difference is only one emotional indicator. However, this difference
of one emotional indicator is the difference between the normal range and the range of
concern (Reynolds, 2006).
Research Question 4
Using hierarchical groups of diagnoses, four one-way ANOVAs indicated that the
BGSS Copy Scores (F (3,71) = 0.56), BGSS Recall Scores (F (3,71) = 0.76), Koppitz 2
Total Score (F (3,71) = 0.46), Koppitz 2 Emotional Indicators (F (3,71) = 1.07) and
Koppitz 2 VMI (F (3,71) = 0.46) did not significantly differentiate sample participants
with (a) bipolar, (b) unipolar depression, (c) behavior disorders, or (d) any other
disorders, at p > .05. Due to nonsignificant ANOVAs and low shared variance among the
different measures, additional analyses were not performed on this relationship to
determine the level of specificity and sensitivity in relationship to clinical decision
making when using the Bender Gestalt II together with the BGSS for ROC. 

UTILITY OF THE BGII 76
Chapter 5: Discussion
In this study, the relationship between two scoring systems of the Bender Visual
Motor Gestalt Test II and two empirically validated diagnostic instruments: the Child
Behavior Checklist and the KSADS-PL+ were evaluated. The primary goal of this study
was to determine the clinical utility of the Bender Gestalt II as a screening instrument in
the initial intake process for identification of psychopathology in children and
adolescents in a clinical setting. Specifically, is this nonverbal measure of visual motor
integration useful as a projective indicator of individual psychopathology in children?
Examination of the data indicates that aspects of both scoring systems for the Bender
Gestalt II, the BGSS and the Koppitz 2 were found to have some limited utility in
identifying psychopathology in children.
Discussion of Hypotheses
The initial hypothesis that the BGII, when compared with the CBCL, would be
useful as a psychometric clinical screening instrument with children and adolescents was
weakly validated by the current findings. These results indicate that when compared to
the CBCL, the Bender Recall scores showed a significant negative correlation with the
CBCL Externalizing scores. That is, children who exhibit a greater number or frequency
of externalizing or acting out behaviors were found to have lower scores on the recall
subtest. Bender Recall scores and CBCL Internalizing scores, reflective of such
disorders as anxiety, depression and other dysfunctions that are not socially disruptive,
UTILITY OF THE BGII 77
lacked statistical correlation. This is expected considering previous studies have
demonstrated that children who exhibit externalizing disruptive behaviors tend to have
more difficulty with short-term memory than those with internalizing disorders
(Kooistra, Crawford, Dewey, Cantell, & Kaplan, 2005; Raggi & Chronis, 2006).
When comparing the CBCL with the BGSS and the Koppitz 2 results, several
potentially useful findings did appear. Although the mean of the Bender results was
within the normal range upon initial review, the individual CBCL mean was in the
borderline clinical range. One possible reason for the difference between means is that
the children completed the Bender Gestalt II themselves, whereas the caregiver was
responsible for the CBCL results. In reviewing a series of reports, Youngstrom (2006)
reported previously that parents are the most accurate reporters of symptoms. It is thus
possible that the Bender is not assessing the same type of symptoms as children report
differently than their caregivers. However, when subcategories of the test results were
evaluated, a level of correlation began to emerge. Bender Recall T Scores were found to
be associated with those of CBCL Externalizing and Total Problem scores. Results
indicate that those with higher Externalizing scores and Total Problem scores found it
more difficult to recall the presented Bender items. These recall items on the Bender
Gestalt II are related to short-term memory and accuracy of drawing (Brannigan &
Decker, 2003). This is likely due to the relationship between attention and short-term
memory, which is a difficulty commonly reported by caregivers and further supported by
the Bender performance on recall tasks.
UTILITY OF THE BGII 78
On the CBCL syndrome scales, several further meaningful relationships can be
identified. Bender Recall scores tended to be negatively related to those scoring high on
ADHD symptoms, as well as those reporting with aggressive behavior problems. This
was similar to the findings of Allen (2005) who found a relationship between youth with
ADHD and low recall memory. Allen further found lower Bender Recall scores were
positively associated with increased scores for rule-breaking behaviors. However, this
was not supported in the current study when compared with the CBCL rule-breaking
subcategory. In the current research, higher BGSS copy and recall scores were found to
be negatively associated with CBCL attention scores. It is not surprising that those with
difficulty paying attention would have a difficult time attending to accuracy of the
drawings and later recalling the drawing to reproduce. Previous researchers found similar
relationships with regard to the BGSS and diagnosis of ADHD and the specific symptom
of impulsivity in children (Allen, 2005; Oas, 1984). Allen found that ADHD-diagnosed
children performed more poorly overall on the Bender Gestalt II than normal children.
Oas had previously found that results for adolescents with impulse disorders were
significantly different from those designated as nonimpulsive on the Matching Familiar
Figures Test, a behavior rating scale. It is important to report that the associated strength
of most of these findings was considered weak but in the hypothesized direction. Since I
did not have a nonclinical group as control and comparison, and this was a consecutive
case series design, the current research findings are preliminary and not meant to be
conclusive.
UTILITY OF THE BGII 79
A finding not previously reported but found in the current research is the strong
correlation of reported increase in thought problems on the CBCL and high BGSS recall
scores. The CBCL defines thought problems as obsessive thoughts, hallucinations,
strange behaviors and atypical sleep patterns. One would think thought problems would
impede the ability to maintain attend and recall. One possible reason for this finding may
be that these children are highly aware of their surroundings or hyper vigilant, but are not
outwardly expressing their thoughts and perceptions, thus resulting in their caregivers
misinterpreting their behavior as odd or thought disordered.
A further hypothesis of this study postulated that when compared to a research
instrument such as the KSADS-PL+, The Bender Gestalt II and the Koppitz 2 scoring
methods results would be found to be valid measures of childhood pathological symptom
severity. The KSADS-PL+ semistructured interview process represents both a qualitative
and quantitative approach to data collection which affords clinicians a wider depth and
scope of diagnoses. However, results from the Bender Gestalt II Global Scoring system
were not found to be correlated with scores derived from the WASH-U-KSADS-PL+.
There were no statistically significant findings on the BGSS when compared with the
WASH-U-KSADS-PL+ for symptoms within the childhood diagnoses groups. As
previously discussed, the KSADS-PL+ allows for more parental input than the forced
choice standardized method of the CBCL and may have contributed to the lack of
relationship between these two variables, as well as the varying results among these
measures when further compared to the BGII scoring systems. Research comparing a
UTILITY OF THE BGII 80
testing measure such as the Child Behavior Checklist and the General Behavior Inventory
with an interview method has been previously reported and the findings have been found
useful in comparing a more subjective clinical impression to an empirically valid
instrument (Youngstrom et al., 2000; Youngstrom et al., 2001, Youngstrom et al., 2005).
As previously discussed, the Koppitz 2 scoring system resulted in several
significant findings regarding individuals diagnosed with ADHD using the WASH-U-
KSADS-PL+ and the CBCL. As a screener for children with possible ADHD, the
Koppitz 2 may aid in clinician decision making and thus be clinically useful. However,
the Koppitz 2 Emotional Indicators (EI) were not found to be of benefit for diagnostic
purposes in my research. As previously noted, the Emotional Indicators are those items
that Koppitz recognized as highly indicative of pathology. Therefore, despite the weak
results in my initial research, the Bender Gestalt II as a screener for ADHD diagnostic
purposes in accompaniment with the WASH-U-KSADS-PL+ semistructured interview
could be an avenue for future research.
A third hypothesis suggested that significant relationships would be found
between the scoring systems of BGSS and the Koppitz 2 in relation to LEAD diagnostic
categories. The final diagnostic categories were reviewed and determined during the
formal LEAD process. These categories were defined previously as part of the bipolar
study and therefore delineated mood disorders into two separate categories, Unipolar/
Depression and Cyclothymic/ Bipolar Spectrum Disorders. The Koppitz 2 and BGSS
were unable to differentiate between the two mood disordered groups presented, although
UTILITY OF THE BGII 81
originally this was hypothesized. In previous research Shapiro and Simpson (1995)
found that clients’ primary psychiatric diagnoses were unrelated to their Bender
performance. Yet when examining results obtained using the Koppitz 2 scoring method,
they found Koppitz error rates to be weakly related to concentration. A more recent study
by Allen (2005) indicated that ADHD-diagnosed children tended to do more poorly on
the Bender Gestalt II than a normal group of children. However, these differences
disappeared when a control variable for intellectual level was introduced. My current
research findings also show that those diagnosed with ADHD have lower scores and thus
could further support Allen’s findings (Shapiro & Simpson, 1995).
The final hypothesis of this study investigated expected differences between the
BGSS and Koppitz 2 scores in a sample of 75 clinically referred children aged 5 to 18.
These two scoring systems were found to be statistically significantly related to each
other on most items, subcategories, and main scoring results with the exception of the
BGSS Recall. Recall is more a measure of memory than drawing accuracy, and therefore
the lack of statistically significant correlation is not surprising in comparison to the other
measures within the scoring systems.
Research by Brannigan, Decker, and Madsen (2004) found a significant
difference in the scoring approach of the Koppitz 2 versus the BGSS, leading them to
describe the Koppitz 2 as “more lenient” in scoring than the BGSS. They noted that
Koppitz 2 focused on specific aspects within a drawing whereas the BGSS scores the
whole gestalt of the drawing. This finding was supported in the current research and may
UTILITY OF THE BGII 82
have resulted in some of the positive findings with regard to the Koppitz 2 and not the
BGSS. However, it is not clear if the scoring systems’ differences were the only factor in
false positive findings or that other factors, such as a small sample size (75 out of 100 or
more), may have contributed to the increase of false positives.
Additional research showed that when compared to the Beery-Buktenica
Developmental Test of Visual Motor Integration- Fourth Edition (VMI-IV), the Koppitz
2 scoring system was less reliable than the BGSS. Nonetheless, the Koppitz 2 remains the
preferred system for analysis of visual motor perception. In my research, I found that
both systems when combined added the most clinical information in regard to visual-
motor perception, as well as clinical understanding of the child’s test taking behaviors
during the administration of the BGII measure.
Limitations of the Study
The sample of this study initially consisted of a clinical group of 115 children.
Several items from other assessments were found to be missing in the 115 individuals
who completed the Bender Gestalt II test resulting in the lower final total sample of 75.
Additionally, there was no control group of children without problems for comparison in
this study. The majority of the subjects completing the protocol were 8 years and above,
limiting the number of younger children for my research. This may have been a further
limitation given the already small sample size. There may also have been an
unwillingness of parents to answer certain items, such as those related to legal issues and
the child’s conduct on the day of the interview.
UTILITY OF THE BGII 83
There are the limitations related to the demographics of the sample. Participants
in the present study were children living in urban Cleveland, Ohio. This may limit the
possibility of generalization of findings to other geographic locations and populations.
The sample was further limited by a lack of racial diversity. Participants were primarily
African American (n = 63), consistent with previous referral patterns at the research
facility (Youngstrom et al., 2000; Youngstrom et al., 2001; Youngstrom et al., 2005).
Results may differ in larger metropolitan areas, more rural settings, or areas with a
different demographic profile. For example, the data raised some questions about the
impact of demographics, but the sample sizes were too small to come to any conclusions
and further study may be warranted in this area. A further limitation is the failure to
include a measure of intelligence within the study, which leaves the contributions of a
potentially important variable unexplored.
Other limitations of this study are related to the clinical nature of the sample. The
present sample consisted of children who were referred for treatment, and met the criteria
for a DSM-IV-TR diagnosis prior to interview. Upon agreement to be in the study,
children and caregivers were both assessed according to the previously reported protocol.
Data were not collected from anyone in the community who did not have a clinical
relationship with the center. This suggests the possibility of a self-selection bias in
addition to the restriction based on clinical status. Additionally, participants were
grouped according to previously established clinical categories based on identifying
children with bipolar spectrum disorders. Although a strict interview protocol was
UTILITY OF THE BGII 84
maintained, the fact that the larger study was geared toward the diagnosis of bipolar
disorder in children makes it impossible to completely eliminate the possibility of
examiner bias.
Another concern in the present study is that the scores on the CBCL were found
to be two standard deviations above the normal average, suggesting that these children
were exhibiting behaviors in the clinical range at the time of interview. Although this
supports the previously stated limitation as a clinical group, it also should serve as a
caution to the reader not to compare these findings to normal children and adolescents.
In addressing the CBCL assessment for use in this study, previous research has found that
the parent report is a good predictor of diagnostic concerns (Youngstrom et al., 2000).
However, clinicians cannot always be certain about parental motivation or accuracy with
regard to the identification of pathology. That is, a parent may exaggerate certain
symptoms or frequency of behaviors while underreporting their perception of the child’s
internal thoughts and feelings. One more way to address this limitation may be to collect
more self-report data from the child clients themselves. However, some studies suggest
that child self-reporting also tends to be lacking in insight and underestimates problem
behaviors (Youngstrom, 2006). Another avenue to address this limitation may be to have
a second reporter complete the CBCL for each child.
Implications for Practice
The BGSS and the Koppitz 2, previously normed for developmental visual motor
integration (Brannigan & Decker, 2003, Reynolds, 2007), were found to be clinically
UTILITY OF THE BGII 85
useful when used with other assessments in diagnosing ADHD in children and
adolescents. However, to be most useful, clinicians should consider the base rate of a
possible ADHD diagnosis within their practices (Frazier, 2006). The use of the BGSS to
evaluate more than visual motor difficulties could be of benefit to clinicians who have
been trained in the use of the original Bender Gestalt (OBG) as a projective measure.
Although this is initial research, the results of comparing the Bender Gestalt II with the
CBCL and a semistructured interview, do suggest that the Bender has some clinical
utility. The findings support the use of the Bender Gestalt II in children and adolescents
as a screening instrument for visual perceptual difficulty, impulsivity, short-term memory
recall, and organizational ability of the individual. The findings in the study do not
support using the BGII as a purely projective measure. The BGII may be able to
distinguish healthy children and adolescents from those with psychopathology as the
OBG did, as previously reported by Bender (1938) and Koppitz (1975) along with other
proponents of the Bender. However, that assertion cannot be made in this study because
all children evaluated in this study were already from a clinical population.
Results of this study indicate that the Bender Gestalt II measures aspects other
than simple visual motor perception and possibly begins to provide insight into the
differentiation between types of psychopathology. Incremental validity improves when
this test is combined with other standardized interview techniques. Therefore, if the
clinician has difficulty in clearly making an ADHD diagnosis, the Bender Gestalt II may
provide further clinical evidence. It was found that those with ADHD had several
UTILITY OF THE BGII 86
significant correlations with the CBCL and WASH-U-KSADS-PL+. Children diagnosed
with ADHD tended to have poorer recall and lower overall Koppitz 2 scores than others.
Although my study is exploratory in nature it hopefully represents the beginning
of more empirically driven research into the utility of the BGII. This investigation,
conducted in the framework of a clinical setting, is intended to have applied research
implications. The clinical setting as a condition of the research may have created some
further limitations. Evidence supporting the use of the Bender for clinical practice as a
diagnostic instrument was not fully achieved within this study, as had been hypothesized.
This study employed two scoring systems of the BGII: the BGSS published in 2003 and
the Koppitz 2 published in 2007. Future researchers might wish to review other scoring
methods that were developed to identify psychopathology using the older version of the
Bender. Further research is needed to support the work of Brannigan (2003), who applied
the previously-developed Hutt scales to the BGII and found it reliable and valid for
personality assessment.
This study only reviewed the BGSS with the Koppitz 2 items for overall
correlation but did not address scoring on individual items within those measures. The
BGII test was selected for this study as a projective measure because it is time efficient
and simple for clinicians to administer. It is not the only projective measure available to
clinicians and it would be beneficial to explore other projective measures as well for
possible increased utility as they relate to empirically standardized measures, such as the
KSADS-PL+ and the CBCL.
UTILITY OF THE BGII 87
It would also be of benefit for follow up research to address incremental validity
of the Bender Visual Motor Gestalt II in relation to other screening tests utilized in the
larger ABACB study (Youngstrom, 2005), such as the General Behavior Inventory or the
Wechsler Intelligence Scale for Children-IV. This would add to previous research by
identifying strengths of the BGII test as a useful measure of client psychopathology. It
also could further address the current hypothesis of the BGII as a useful projective
measure in clinical diagnostic decision making. As the ABACB study has a longitudinal
component with a 5-year follow-up, the Bender Gestalt II could be readministered and
compared within an adult population (Garb, 2003). Therefore, further research could
address situational changes and consistency among the drawings in an individual for this
study over a 5-year time period. These findings could be very valuable in determining the
long term usefulness of the BGII test as a clinical measure.
Conclusions
The field of psychology has evolved and grown significantly over the last 50
years and currently there are many more clinical assessment tools that are attractive to
psychologists because of their perceived clinical usefulness. However, in some cases
these are being pushed aside for measures and checklists that, while useful, may be too
transparent in their questioning style. The publication of a revised Bender Gestalt II test
in 2003 provided this researcher with the opportunity to look at the current clinical utility
of one such historical diagnostic measure still in use.
UTILITY OF THE BGII 88
The use of the Bender Visual Motor Gestalt Test II as a direct measure of internal
mental state was explored as a possible addition to the tools used in reaching a clinical
diagnosis. It may provide an important “missing link” in the current evaluation process
by helping to bridge the gap between parent and child symptom reporting. The current
results suggest that the Bender Visual Motor Gestalt Test II is a less than adequate
screening tool for diagnosing clients with unipolar depression versus bipolar disorders,
but is somewhat more useful for identifying childhood disruptive behavioral disorders.
With the publication of the Bender Gestalt II, the instrument has been standardized with
specific instructions for administration and scoring. The test possesses adequate validity
and reliability as a test of visual motor integration.
As discussed, findings regarding the use of the Bender II as a projective
instrument were mixed depending on which scoring method was used. It appears that
internalizing children versus those considered externalizers or those with high levels of
total problems draw differently in overall quality, resulting in significantly different
scores derived from the Bender Visual Motor Gestalt Test II Global Scoring System and
Koppitz- 2 scoring systems. Thus it may be that the Bender drawings do assess some
internal states of the individual and may reflect some individual Gestalt processes as
originally proposed by Wertheimer and Bender (Bender, 1938). However results did not
support the use of Koppitz Emotional Indicators as a measure of psychopathology.
The BGSS and Koppitz 2 results in regard to significant findings of the KSADS
diagnoses were disappointing and limited. It was found that the only diagnosis
UTILITY OF THE BGII 89
consistently related to BGII performance was the diagnosis of ADHD. It was also found
that Koppitz 2 scores based on error rates were more sensitive than the Bender Global
Scoring system in clinician decision making regarding ADHD. The Global Scoring
System identified Recall Scores to be related to ADHD, but was not found to be as
predictive of that as the Koppitz 2.
In conclusion, the findings of the present study indicate that the Bender Gestalt II
may have use beyond its traditional value as a measure of visual motor ability. Despite
the mixed results in supporting the main hypotheses, the current findings are useful. This
is an initial investigation that suggests that further research on the Bender Gestalt II as a
screening tool for childhood pathology would be valuable.
UTILITY OF THE BGII 90
References
Achenbach, T. M. (1991). Manual for the Child Behavior Checklist/4-18 and 1991
Profile. Burlington: University of Vermont, Department of Psychiatry.
Achenbach System of Empirical Based Assessment. (2006). Achenbach Child Behavior
Checklist. Retrieved from http://www.aseba.org/
Achenbach T. M., & Rescorla, L. A. (2001). Manual for ASEBA School-Age Forms &
Profiles. Burlington, VT: University of Vermont, Research Center for Children,
Youth, & Families.
Aklin, W. M., & Turner, S. M. (2006). Toward understanding ethnic and cultural factors
in the interviewing process. Psychotherapy: Theory, Research, Practice,
Training, 43(1), 50-64.
Allen, R. A. (2005). Utility of the Bender Gestalt –Second edition in the assessment of
attention- deficit/hyperactivity disorder. Dissertation Abstracts International:
Section B: The Sciences and Engineering, 65(11-B) 6035.
Altman, D, G., & Bland, J. M. (1994a). Diagnostic tests 1: Sensitivity and specificity.
British Medical Journal, 308, 1552-1561.
Altman, D. G., & Bland, J. M. (1994b). Diagnostic tests 3: Receiver operating
characteristic plots. British Medical Journal, 309, 188-194.
Ambrosini, P. J. (2000). Historical development and present status of the schedule for
affective disorders and schizophrenia for school age children (K-SADS). Journal
of the American Academy of Child and Adolescent Psychiatry, 39(1), 49-58.
UTILITY OF THE BGII 91
American Psychiatric Association. (2000). Diagnostic and Statistical Manual of Mental
Disorders (4
th
ed., text revision). Washington, DC: Author.
Archer, R. P., Maruish, M., Imhof, E. A., & Piotrowski, C. (1991). Psychological test
usage with adolescent clients: 1990 survey findings. Professional Psychology:
Research and Practice, 22, 247-252.
Barker, C., & Pistrang, N. (2004). Quality criteria under methodological pluralism:
Implications for conducting and evaluating research. Retrieved from
http://www.ucl.ac.uk/publications/papers.
Belter, R.W., McIntosh, J. A., Finch, A. J., & Williams, L. D. (1989). The Bender Gestalt
as a method of personality assessment with adolescents. Journal of Clinical
Psychology, 45, 414-423.
Bender, L. (1938). A Visual Motor Gestalt Test and its clinical use: Research
monographs No. 3. New York, NY: The American Orthopsychiatric Association.
(Original work published 1938).
Bigler, E. D., & Ehrfurth, J. W. (1981). The continued inappropriate singular use of the
Bender visual motor gestalt test. Professional Psychology: Research and
Practice, 12, 562-569.
Bowland, J. A., & Deabler, H. L. (1956). A Bender-Gestalt diagnostic validity study.
Journal of Clinical Psychology, 12 (1), 82-84.
Brannigan, G. G., & Decker, S. L. (2003). The Bender Visual- Motor Gestalt Test (2nd
ed.). Itasca, IL: Riverside.
UTILITY OF THE BGII 92
Brannigan, G. G., & Decker, S. L. (2006). The Bender –Gestalt II. American Journal of
Orthopsychiatry, 76 (1), 10-12.
Brannigan, G. G., Decker, S. L., & Madsen, D. H. (2004). Innovative features of the
Bender Gestalt II and expanded guidelines for the use of the global scoring
system. Itasca, IL: Riverside.
Canter, A. U. (1968). BIP Bender test for the detection of organic brain disorder:
Modified scoring method and replication. Journal of Consulting and Clinical
Psychology, 32, 522-526.
Cashel, M. L. (2002). Child and adolescent psychological assessment: Current clinical
practices and the impact of managed care. Professional Psychology: Research
and Practice, 33, 446-453. doi:10.1037/0735-7028.33.5.44
Charman, T., & Baird, G. (2002). Practitioner review: Diagnosis of autism spectrum
disorder with 2-and 3-year old children. Journal of Child Psychology and
Psychiatry, 43, 289-305.
Choi, B. C. K. (1998). Slopes of a receiver operating curve and likelihood ratios for a
diagnostic test. American Journal of Epidemiology, 148, 1127-1132.
Decker, S. L., Allen, R., & Choca, J. P. (2006). Construct validity of the Bender Gestalt
II: Comparison with Wechsler Intelligence Scale for Children -III. Perceptual
Motor Skills, 102(1), 133-141.
DeClercq, B., DeFruyt, F., Van Leewen, K., & Mervielde, I. (2006). The structure of
maladaptive personality traits in childhood: A step toward an integrative
UTILITY OF THE BGII 93
developmental perspective for DSM-V. Journal of Abnormal Psychology, 115,
639-657.
Dryden, W. (1986) Eclectic Psychotherapies: A critique of leading approaches. In J.C.
Norcross (Ed.) Handbook of Eclectic Psychotherapy (pp. 353-375). New York,
NY: Brunner/Mazel.
Elbert, J. C., & Holden, E. W. (1987) Child diagnostic assessment: Current trends in
clinical psychology internships. Professional Psychology: Research and Practice,
18, 587-596.
Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis
program. Behavior Research Method, Instruments & Computers, 28, 1-11.
Fidal, C. A. (2004). Examining abuse indicators on the Bender Gestalt Test. Dissertation
Abstracts International: Section B: The Sciences and Engineering, 64 (8-B),
4003.
Field, K., Bolton, B., & Dana, R. H. (1982). An evaluation of three bender-gestalt scoring
systems as indicators of pathology. Journal of Clinical Psychology, 38, 838-842.
Findling R. L., Gracious, B. L., McNamara, N. K., Youngstrom, E. A., Demeter, A. C.
A., Branicky, L. A., & Calabrese, J. R. (2001). Rapid, continuous cycling and
psychiatric co-morbidity in pediatric bipolar I disorder. Bipolar Disorders, 3, 202-
210.
UTILITY OF THE BGII 94
Findling, R. L., Youngstrom, E. A., Danielson, C. K., DelPorto-Bedoya, D., Papish-
David, R., & Townsend, L. (2002). Clinical decision-making using the General
Behavior Inventory in juvenile bipolarity. Bipolar Disorders, 4, 34-42.
Findling, R. L., Youngstrom, E. A., McNamara, N. K., Stansbrey, R. J., Demeter, C. A.,
Bedoya, D., … Calabrese, J. R. (2005). Early symptoms of mania and the role of
parental risk. Bipolar Disorders, 7, 623-634.
First, M. B., Pincus, H. A., Levine, J. B., Williams, J. B., Ustun, B., & Peele, R. (2004).
Clinical utility as a criterion for revising psychiatric diagnoses. American Journal
of Psychiatry, 161, 946-954.
Frazier, T. (2006). Evidence based assessment of Attention Deficit Hyperactivity
Disorder. Cleveland, OH: Applewood Centers.
Garb, H. N. (2003). Incremental validity and the assessment of psychopathology in
adults. Psychological Assessment, 15, 508-520. doi:10.1037/1040-3590.15.4.508
Geller, B., Zimmerman, B., Williams, M., Bolhofner, K., Craney, J. L., Delbello, M. P.,
& Soutullo, C. (2001). Reliability of the Washington University in St. Louis
Kiddie Schedule for affective disorders and schizophrenia (WASH-U-KSADS)
mania and rapid cycling sections. Journal of Academy of Child and Adolescent
Psychiatry, 40, 450-455.
Gerber, S., Appleton, V., Dykeman, J.C., Sampson, D., & Toews, J. (1994). The vital
balance revisited or the resolution of the Counseling Profession’s identity split.
UTILITY OF THE BGII 95
Journal of Counseling Psychology, 37, 2-14. (ERIC Document Reproduction
Service No. ED373279)
Ghanizadeh, A., Mohammadi, M. R., & Yazdanshenas, A. (2006) Psychometric
properties of the kiddie schedule for affective disorders and schizophrenia-present
and lifetime version. Biomedcentral Psychiatry, 6(10). Retrieved from http://
www.biomedcentral.com/1471-244X/6/10
Gracious, B. L. Youngstrom, E. A., Findling, R. L., & Calabrese, J. R. (2002).
Discriminative validity of the parent version of the young mania rating scale.
Journal of the American Academy of Child and Adolescent Psychiatry, 41, 1350-
1359.
Hamza, T. H. (2008). Meta analyses of diagnostic test evaluation data: Random effects
approaches. Annals of Internal Medicine, 149, 889-897. doi: 978-90-9023002-3.
Horn, W.F., & O’Donnell, J.P. (1984). Early identification of learning disabilities: A
comparison of two methods. Journal of Education Psychology, 76, 1106-1110.
Hothersall, D. (1995). History of psychology (3rd ed.). New York, NY: McGraw-Hill.
Hutt, M. L. (1985). The Hutt adaptation of the Bender-gestalt test (4th ed.). Orlando, FL:
Harcourt Brace Jovanovich.
Imm, P.S., Kim, Y.F., Belter, R.W., & Finch, A.W. (May 1991). Assessment of short
term visual memory in child and adolescent psychiatric inpatients. Journal of
Clinical Psychology, 47, 441-443.
UTILITY OF THE BGII 96
Ivanova, M. Y., Achenbach, T. M., Dumenci, L., Rescorla, L.A., Almqvist, F.,
Weintraub, S., … Frigerio, A. (2007) Testing the eight syndrome structure of the
Child Behavior Checklist in 30 societies. Journal of Clinical Child and
Adolescent Psychology, 36, 405-417.
Jaeschke, R., Guyatt, G. H., & Sackett, D. L. (1994). Users’ guides to the medical
literature: III. How to use an article about a diagnostic test: Section B: What are
the results and would they help me in caring for my patients? Journal of the
American Medical Association, 271, 389-391.
Jellinek, M.S., & McDermott, J.F. (2004). Formulation: Putting the diagnosis into a
therapeutic context and treatment plan. Journal of the American Academy of Child
and Adolescent Psychiatry, 43, 913-917.
Kamphaus, R.W., Petoskey, M. D., & Rowe, E. W. (2000). Current trends in
psychological testing of children. Professional Psychology: Research and
Practice, 31(2), 155-164. doi: 10.1037/0735-7028..31.2.155
Kaufman, J., Birmaher, B., & Brent, D. (2003). Schedule for affective disorders and
schizophrenia for school-age children- Present and lifetime version (K-SADS-
PL): initial reliability and validity data. Journal of American Academy of Child
Adolescent Psychiatry, 36, 980-988.
Keogh, B. (1965). School achievement associated with successful performance on the
bender gestalt test. Journal of School Psychology 3(3), 37-40.
UTILITY OF THE BGII 97
Klein, D. N., Ouimette, P. C., Kelly, H. S., Ferro, T., & Riso, L. P. (1994). Test-retest
reliability of team consensus best-estimate diagnoses of Axis I and II disorders in
a family study. The American Journal of Psychiatry, 151, 1043-1047.
Kooistra, L., Crawford, S., Dewey, D., Cantell, M., & Kaplan, B. J. (2005). Motor
correlates of ADHD: contribution of reading disability and oppositional defiant
disorder. Journal of Learning Disabilities, 38(3), 195-206.
Koppitz, E. M. (1968). Psychological Evaluation of children’s human figure drawings.
New York, NY: Grune & Stratton.
Koppitz, E. M. (1971). Children with learning disabilities: A five year follow-up study.
New York, NY: Grune & Stratton.
Koppitz, E. M. (1975) The Bender Gestalt Test for young children, Volume II: Research
and application, 1963-1973. New York, NY: Grune & Stratton.
Krueger, R., & Finger, M. (2001). Using item response theory to understand comorbidity
among anxiety and unipolar mood disorders. Psychological Assessment, 13(1),
140- 151.
La Fiosca, T., & Loyd, B. (1986). Defensiveness and the assessment of parent stress and
anxiety in parents. Journal of Clinical Psychology, 15, 254-59.
Lengua, L. J., Sadowski, C. A., Friedrich, W. N., & Fisher, J. (2001). Rationally and
empirically derived dimension of children’s symptomology expert ratings and
confirmatory factor analyses of the CBCL. Journal of Consulting and Clinical
Psychology, 69, 683-698.
UTILITY OF THE BGII 98
Lipovsky, J. A., Finch, A. J., & Belter, R.W. (1989). Assessment of depression in
adolescence: Objective and projective measures. Journal of Personality
Assessment, 53, 449-458.
Luebbe, A. M., Radcliffe, A. M., Callands, T. A., Green, D. & Thorn, B. E. (2007).
Evidence-based practice in psychology: Perceptions of graduate students in
scientist–practitioner programs. Journal of Clinical Psychology, 63, 643-655.
McCormick, T. T., & Brannigan, G. G. (1984). Bender Gestalt Signs as indicants of
anxiety, withdrawal, and acting-out behaviors in adolescents. Journal of
Psychology, 118(1), 71-74.
McKeon, P. O., Medina, J. M., & Hertel, J. (2006). Hierarchy of research design in
evidenced- based sports medicine. Athletic Therapy Today, 11(4), 41-45.
Mehlman, B., & Vatovec, E. (1956). A validation study of the Bender-Gestalt. Journal of
Consulting Psychology, 20(1), 71-74.
Miyake, A., Friedman, N. P., Rettinger, D. A., Shah, P., & Hegarty, M. (2001). How are
visuospatial working memory, executive functioning, and spatial abilities related?
A latent variable analysis. Journal of Experimental Psychology: General, 130,
621-640.
Morgan, R. D., Olson, K. R., Krueger, R. M., Schellenberg, R. P., & Jackson, T. (2000).
Do the DSM decision trees improve diagnostic ability? Journal of Clinical
Psychology 56 (1), 73–88. doi: 10.1002/ (SICI) 1097-4679(200001)56:1<73:
AID-JCLP7>3.0.CO; 2-I
UTILITY OF THE BGII 99
Naglieri, J. A., & Pfeiffer, S. I. (1992). Performance of disruptive behavior disordered
and normal samples on the draw a person: Screening procedure for emotional
disturbance. Psychological Assessment, 4(2), 156-159.
Oas, P. (1984). Validity of a Draw-A-Person and Bender-Gestalt as measures of
impulsivity with adolescents. Journal of Consulting and Clinical Psychology, 52,
1011-1019.
Pascal, B. J., & Suttell, G. R. (1952)."Regression" in schizophrenia as determined by
performance on the Bender-Gestalt test. The Journal of Abnormal and Social
Psychology, 47, 653-657.
Perticone, E. X. (1998). The clinical and projective use of the Bender-Gestalt test.
Springfield, IL: Charles C Thomas.
Pilkonis, P. A., Heape, C. L., Ruddy, J., & Serrao, P. (1991). Validity in the diagnosis of
personality disorders: The use of the LEAD standard. Psychological Assessment,
3(1), 46-54.
Piotrowski, C. (1995). A review of the clinical and research use of the Bender-Gestalt
Test. Perceptual and Motor Skills, 81, 1272-1274.
Piotrowski, C., & Keller, J.W. (1989). Psychological testing in outpatient mental health
facilities. Professional Psychology: Research and Practice, 20, 423-425.
Raggi, V. L., & Chronis, A. M. (2006). Interventions to address the academic impairment
of children and adolescents with ADHD. Clinical Child and Family Psychology
Review, 9(2), 85-111. doi:10.2007/s10567-006-0006-0
UTILITY OF THE BGII 100
Reynolds, C. R. (2007). Koppitz developmental scoring system for the Bender-gestalt
test-2nd edition (Koppitz 2) Rater’s manual. Austin, TX: Pro-Ed Inc.
Roid, G. H. (2003). Stanford-Binet intelligence scales (5th ed.) Itasca, IL: Riverside.
Rossini, E. D. (1983). The Bender-Gestalt psychopathology scale: Failure to infer
validity in a school-aged sample. Journal of Personality Assessment, 51, 254-261.
Rossini, E. D. & Kaspar, J. C. (1987). The validity of the Bender- Gestalt emotional
indicators. Journal of Personality Assessment, 51, 254-261.
Shapiro, S. K., & Simpson, R. G. (1995). Koppitz Scoring System as a measure of
Bender –Gestalt performance in behaviorally and emotionally disturbed
adolescents. Journal of Clinical Psychology, 51(1), 108-112.
Smith, D. J., Muir, W. J., & Blackwood, D. H. (2006). Neurocognitive impairment in
euthymic young adults with bipolar spectrum disorder and recurrent major
depressive disorder. Bipolar Disorder, 8(1), 40-46.
Sourander, A., Haavisto, A., Ronning, J.A., Multimaki, P. Parkkola, K., Santalahti, P., …
Almqvist, F. (2005). Recognition of psychiatric disorders, and self perceived
problems. A follow up study from age 8 to age 18. Journal of Child Psychology
and Psychiatry, 46, 1124-1134.
Spitzer, R. L. (1983). Psychiatric diagnosis: Are clinicians still necessary?
Comprehensive Psychiatry, 24, 399-411.
Stewart, H. F. (1957). A note on recall patterns using the Bender Gestalt with psychotic
and non-psychotic patients. Journal of Clinical Psychology, 13, 95-97.
UTILITY OF THE BGII 101
Tolan, P. H., & Dodge, K. A. (2005). Children’s mental health as a primary care and
concern: A system for comprehensive support and service. American
Psychologist, 60, 601-614.
Valderhaug, R., & Ivansson, T. (2005). Functional impairment in clinical samples of
Norwegian and Swedish children and adolescents with obsessive-compulsive
disorder. European Child & Adolescent Psychiatry, 14(3), 164-173.
Wassenberg, R., Max, J. E., Koele, S. L., & Firme, K. (2004). Classifying psychiatric
disorders after traumatic brain injury and orthopedic injury in children; adequacy
of KSADS versus CBCL. Brain Injury, 18, 377-390. doi:10.1080/
02699250310001617325
Wertheimer, M. (1923). Laws of organization in perceptual forms. First published as
Untersuchungen zur Lehre von der Gestalt II, in Psycologische Forschung, 4,
301-350. Translation published in Ellis, W. (1938). A source book of Gestalt
psychology (pp. 71-88). London: Routledge & Kegan Paul.
Wilson, M. S., & Reschly, D. J. (1996). Assessment in school psychology training and
practice. School Psychology Review, 25(1), 9-23.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock Johnson III Tests of
Achievement. Itasca, IL: Riverside.
Youngstrom, E. (2005). Improving the assessment of juvenile bipolar disorder.
Unpublished Training Manual. (NIH R01 5MH066647)
UTILITY OF THE BGII 102
Youngstrom, E. A. (2006) Training manual for ABACB. Unpublished work for
Applewood Centers, Cleveland, OH.
Youngstrom, E. A., & Duaz, B.A. (2005). Evidenced based assessment of Pediatric
Bipolar Disorder, Part I: Base rate and family history. Journal of American
Academy of Adolescent Psychiatry, 44, 712-717.
Youngstrom, E. A., Findling, R. L., Danielson, C. K., & Calabrese, J. R. (2001).
Discriminative validity of parent report of hypomania and depressive symptoms
of the General Behavior Inventory for juvenile bipolarity. Psychological
Assessment, 13, 267-276.
Youngstrom, E. A., Loeber, R., & Stouthamer-Loeber, M. (2000). Patterns and correlates
of agreement between parent, teacher, and male adolescent ratings of
externalizing and internalizing problems. Journal of Consulting and Clinical
Psychology, 68, 1038-1050.
Youngstrom, E. A., Meyers, O., Demeter, C., Youngstrom, J., Morello, L., Piiparinen,
R..., & Findling, R. L. (2005). Comparing diagnostic checklists for pediatric
bipolar disorder in academic and community mental health setting. Bipolar
Disorders 7, 507-517.
Zweig, M. H., & Campbell, G. (1993). Receiver operating characteristic (ROC) plots: a
fundamental evaluation tool in clinical medicine. Clinical Chemistry 39, 561-577.
UTILITY OF THE BGII 103
Appendix A: IRB Protocol
University Hospitals of Cleveland
Case Western Reserve University IRB Number 01-02-39
Title: Improving the Assessment Process of Children
UTILITY OF THE BGII 104
Proposed IRB Addendum (2007)
Overview:
The proposed amendment would add one measure of developmental visual motor
ability to the ongoing study. The addition of this measure would not increase the length
of the visit for the participating families at all. This was because the parent interview
portion takes substantially longer than the youth interview component. As a result the
addition of the measure filled a gap when the youth would otherwise be waiting for the
parent to complete their portion of the assessment. The introduction of this measure
would provide valuable information regarding the individual functioning of the child and
would form the basis of a doctoral dissertation comparing the KSADS-PL with the
Bender Gestalt Visual Motor Integration Test using two newly reintroduced results. Thus
the addition would add greatly to the validation of clinically driven measurements,
contribute to a limited body of knowledge regarding these measures, and as well have the
potential to fulfill important educational goals.
Rationale:
Using a group of participants already being measured using approved research
methods and deidentified data, a secondary analysis would be performed incorporating
the Bender Visual Motor Gestalt Test II (Brannigan & Decker, 2006). The Bender Gestalt
II is a revision of the original Bender gestalt test developed by Bender in 1938 (Bender,
UTILITY OF THE BGII 105
1938), which had been one of the most used measures in psychological assessments until
the 1990’s (Archer, Maruish, Imhof, & Piotrowski, 1991; Brannigan & Decker, 2006). Its
clinical attractiveness was that it was quick to give, taking less than 10 minutes, was used
in an ice breaker in most psychological evaluations because of its ease on the subject,
found to be a cross cultural, nonverbal measure that had a scoring capabilities for a large
age range (Brannigan & Decker, 2006). The Bender fell out of favor with many
psychologists due to research cautioning the broad use of the Bender for diagnostic
evaluations (Bigler & Ehrfurth, 1981) and low reliability with regard to its use for
projective means (Naglieri & Pfeiffer, 1992).
The reality is that clinicians were in want of effective diagnostic instruments like
the KSADS-PL yet have to manage the time/ cost balance in practice, this study allows
the unique opportunity of comparing the utility of the Bender Gestalt II as a screening
instrument with regard to diagnostic criteria. The study allows the unique opportunity of
a comparison of the clinically efficient measure of the Bender with the diagnostic
reliability of the KSADS-PL. Completing the Bender Gestalt-II measure is not difficult
for the client and typically takes less than 15 minutes total time for the participant, and
would not add any time to the total amount that families spend participating in the current
study. Individual functioning of children is often evaluated to determine level of current
functioning/ability in paper and pencil tasks such as the BVMGT-II. Over the years, such
testing instruments have been compared to various constructs such as achievement levels,
perceptual ability, visual motor ability, developmental capability and various emotional
UTILITY OF THE BGII 106
symptoms related to various diagnostic categories (Bender, 1938; Decker, Allen, &
Choca, 2006; Hutt, 1985; Valderhaug & Ivansson, 2005).
U
U
TILITY O
F
To:
Fro
m
Date
:
Subj
e
Trac
k
Title:
com
p
Than
[boar
sub
m
proje
c
2004
Invol
v
cont
a
Furth
In di
s
deid
e
nam
e
there
f
cont
a
Than
Boar
d
Lette
r
F
THE BGII
Expedite
d
T
u
m
: W
V
:
M
o
e
ct: N
o
k
ing #: H-
The Clini
c
p
ared with t
k you for y
o
dname] In
s
m
ission was
c
t does no
t
Office for
H
v
ing Code
d
a
ct the IRB
ermore, it
w
s
cussions
w
e
ntified bef
o
e
to the dat
a
f
ore not hu
a
ct the IRB
k you.
d
Designe
e
r
Sent By:
A
d
- IRB Pr
o
u
nick, Roy
V
U Office
o
o
nday, Ma
r
o
action re
q
20822
c
al utility o
f
he KSADS
o
ur submis
s
s
titutional
R
reviewed.
t
constitute
H
uman Re
s
d
Private In
f
office for a
w
as deter
m
w
ith the PI,
o
re the PI r
e
a
and ther
e
man subje
c
at (304) 2
9
e
: Ast, Lilo
A
st, Lilo, 3/
IRB Proto
o
tocol - N
o
o
f Researc
h
r
ch 24, 200
8
q
uired
f
the Bend
e
-PL+ in chi
s
ion to the
R
eview Boa
As describ
e
human su
b
s
earch Pro
t
f
ormation
o
copy of th
e
m
ined that:
it is clear t
h
e
ceives it.
T
e
fore no id
e
c
t research
9
3-7073.
24/2008 6:
0
col
o
n Human
S
h
Complian
8
e
r Gestalt II
ldren 5 to
1
West Virgi
n
rd. On [Re
v
ed, it was
d
b
ject resea
r
t
ections G
u
o
r Biologica
e
guidance
h
at the dat
a
T
here is n
o
e
ntifiable in
d
. If you ha
v
03 PM
S
ubjects
R
ce
and the K
o
1
8.
n
ia Univer
s
v
iew Date
N
d
etermined
r
ch, per th
e
u
idance on
l Specime
n
.
a
to be ana
l
o
way for th
d
ividual. T
h
v
e any que
s
107
R
esearch
o
ppitz 2
s
ity
N
ot Found]
that your
e
10 Augus
t
Research
n
s. Please
l
yzed will b
e PI to ma
t
h
e study is
s
tions, ple
a
, the
t
e
t
ch a
a
se
UTILITY OF THE BGII 108
Appendix B: CREC Program Notice of Certification
Linda Marnic,
Congratulations! You are Core Certified in the Continuing Research Education
Credit (CREC) Program managed by Case Western Reserve University (Case).
To obtain Core certification you passed the online CITI Core Training course.
Core Certification means that you would have the ability to request review of
proposed human subject research proposals by Institutional Review Boards at
the following institutions: University Hospitals of Cleveland, The Metro Health
System, and Case Western Reserve University.
Certification in the CREC Program also means that you have met the NIH
educational requirements for the involvement of human participants in research
for Key Personnel. This certification is valid for 3 years and would expire on
11/15/2009. To be Re-Certified after this date you must obtain 12 CRECs before
your Core Certification expires. The following is the URL to our educational web
site outlining the training options available for continuing certification as well as
provide instructions for obtaining information on your current CREC status by
looking up your account information on SPRIDERWEB, Sponsored Projects
Information and Data Entry and Retrieval Website.
http://ora.ra.cwru.edu/orc_education.asp
You may view a printable “Certificate of Achievement” by going here. Please
maintain this document for your records. Copies of this document may also be
submitted to sponsors to indicate compliance with human subject education
requirements.
I hope this information is helpful. Please feel free to contact me if I can be of
further assistance.
The CREC Program
Sears 657, CWRU
216.368.6925
UTILITY OF THE BGII 109
Appendix C: Administration Manual for ABACAB
Bender Gestalt II Administration Manual for ABACAB
I. Who Administers the BG II?
Either the Rater 1 or the Rater 2 with appropriate training would administer the test. It
would be decided on a team basis by the raters.
Suggested time of testing: If Rater 2 administers the standard protocol, and then the
BG II should be administered prior to the questionnaires. If Rater 1 administers the
protocol, then the BG II should be administered at the end the KSADS if time
permits.
If the family is finished before 3:30 pm, the BG II must be administered before
finalizing with the family.
II. What is needed to administer the BG II?
The BG II stimulus cards, BG II observation form (ABACAB version), a motor test, a
perception test, 2 pencils with erasers, 10 sheets of blank printer paper, and a time
keeping device (stopwatch or a watch or clock that had seconds).
III. What is the administration order?
The official and only acceptable administration order is: copy, recall, motor, and
perception. Each phase occurs immediately after the previous phase. There are no
breaks between phases, unless absolutely necessary (this should be marked).
IV. How do I administer the Copy test?
Place the cards in order; design side down. Place a blank sheet of paper vertically and
a pencil in front of the child. Read the following directions:
I have a number of cards here. Each card has a different drawing on it. I would
show you the cards one at a time. Use this pencil to copy the drawing from each
card into this sheet of paper. Try to make your drawings look just like the
drawings on the cards. There are no time limits, so take as much time as you
need. Do you have any questions? Here is the first card.
Show the child the first card and mark the starting time (include seconds). Administer
cards 1 through 13 in order to children below age 8. Administer cards 5 through 16 in
order to children above age 8. Children may erase and use more than 1 sheet of paper.
Do not allow the children to touch the stimulus cards or draw/doodle nontest figures
on the paper.
If the examinee becomes discouraged say:
UTILITY OF THE BGII 110
Do the best you can.
If the examinee asks where to start drawing any figure, say:
Begin wherever you like.
When complete, record the time finished (include seconds). Label the child’s
sheet: COPY SHEET and TOP on the top of the page as started on by the child.
Things to be marked on the Observation sheet by the administrator:
1) Direction and order of drawing
2) Describe any counting that occurs
3) If any of the test-taking observations occur on more than 2 items, check the
box.
4) Mark the tilt of the paper.
V. How do I administer the Recall Phase?
Immediately after the Copy phase, give the child a new sheet of paper placed
vertically. Read:
Now, I want you to draw as many of the designs that I just showed you as you
can remember. Draw them on this new sheet of paper. Try to make your
drawings just like the ones on the cards that you saw earlier. There are no time
limits, so take as much as you need. Do you have any questions? Begin.
Begin timing (include seconds). Stop timing when child finishes all figures or does
not recall any more designs after 2 minutes. Record end time (include seconds). Label
the top of the sheet: Recall Sheet and Top (for top of sheet from child’s beginning).
On the Observation sheet, mark the order in which the items were recalled.
VI. How do I administer the Motor Test?
Immediately after the recall phase, hand the child the BG II Motor Test. Say:
For each item, start with the largest figure. For each figure draw a
line connecting the dots without touching the borders. Do not lift the pencil,
erase, or tilt the paper while drawing. Try the sample item. Do you have any
questions? Now you try it.
Make sure the child completes all items. Repeat directions as needed.
UTILITY OF THE BGII 111
VII. How do I administer the Perception Test?
Immediately after the Motor Test, hand the child the BG II Perception Test. Say:
Look at this picture (point to the design in the first box). There is another
picture that looks just like it in this row (run finger across the first row). Circle or
point to the picture that looks just like this one (point at design in first box).
If the child needs assistance for any item say:
Which one of these pictures looks like this one? (Point at
picture in first box of row)
If it takes the child more than 30 seconds, then say:
Let’s try the next one. Write an S next to any skipped items.
VIII. How do I score the BG II Copy and Recall
Phases?
The scoring system is 0 – 4 (no resemblance to nearly perfect). Please
score according to the following pictures for each item. If uncertain refer to
Bender Gestalt Manual for further pictures.
**Instructions for this manual were from the Bender Gestalt II Manual (2006).
Appendix D: Demographic Form
UTILITY OF THE BGII 112
Demographic Forms
CONTACT INFORMATION Subject #:______________
Child’s Name_______________________________________________
First Middle Last
Nickname: ___________________ Child’s Date of Birth: -
_____/_____/_______
Address:
________________________________________________________
_______________________________________________
Parent/Guardian’s Name:
________________________________________________
Relationship of guardian to child: ____________________________________
Address:
________________________________________________________________
(if different from children) Street/Apt# City State
Zip
Phone: ( ) ________ - __________ Work Phone: ( ) ________ -
____________
Cell Phone: ( ) _______ - ___________ User of Cell:
______________________
E-mail address: _______________________________
If applicable, other parent/guardian’s name:
___________________________________
Relationship of other parent/ guardian to child:
_________________________________
Other parent/guardian’s home address:
_______________________________________
_________________________________________________________
City State Zip
Home telephone (if different): ( ) ________ - ____________
Work Phone: ( ) ________ - ____________
Cell Phone: ( ) _________ - ____________ User of Cell:
_____________________
Emergency contact person: ______________________________
UTILITY OF THE BGII 113
Relation (if known): __________________________
Phone: ( ) ________ - ___________
Alt. Emergency contact person: ______________________________
Relation (if known): __________________________
Phone: ( ) ________ - ___________
Date of Assessment: ________________________
Rater 2 Interview with Primary Caregiver
ACI ID____________
Date of Review ________________
Employ these codes on the pages that follow:
UTILITY OF THE BGII 114
Race/Ethnicity
0 Native American or Alaskan native
1 Asian/Pacific Islander
2 Black/African American, not of Hispanic origin
3 Latino/Hispanic
4 White/Caucasian, not of Hispanic origin
5 Other
Highest Level of Education Completed:
1 Elementary school (less than 7 years of school)
2 Junior high school (7-9 years)
3 Partial high school (10-11 years)
4 high school graduates (includes G.E.D.)
5 1 to 3 years of college, business or trade school
6 College or university graduate (four year college
graduate)
7 Graduate school or professional training (at least 1 year)
8 Completed graduate school
Occupation:
0 Unemployed
1 Menial Service Worker, Farm Laborer
2 Unskilled Worker
3 Machine Operator, Semiskilled Worker
4 Smaller Business Owner, Skilled Manual Worker,
Craftsmen, Tenant Farmers
5 Clerical and Sales Workers, Small Farm and Business
Owners
6 Technicians, Semiprofessionals, Small Business
Owners
7 Smaller Business Owners, Farm Owners, Managers,
Minor Professionals
8 Administrators, Lesser Professionals, Proprietors of
Medium Businesses
9 Higher Execs, Proprietors of Large Businesses, Major
Professionals
Annual Income
1 $0 – 4, 999 7 $40,000 - $49, 000
2 $5,000 - $9, 999 8 $50,000 - $74, 999
3 $10,000 - $14, 999 9 $75,000 - $99,999
4 $15,000 - $19,999 10 $100,000 - $149,999
5 $20,000 - $29, 999 11 $150, 000 - $200,000
6 $30,000 - $39,999 12 More than $200,000
11 Mother
12 Maternal grandmother
13 Maternal grandfather
14 Aunt - mother’s sister
15 Uncle - mother’s brother
16 Sister - shared mother
17 Brother - shared mother
19 Other - biological mother’s relative
21 Step-mother
22 Foster mother
23 Adoptive mother
24 Maternal GM’s sister (Great Aunt)
25 Maternal GM’s brother (Great Uncle)
26 Maternal GF’s sister (Great Aunt)
27 Maternal GF’s brother (Great Uncle)
31 Father
32 Paternal grandmother
33 Paternal grandfather
34 Aunt - father’s sister
35 Uncle - father’s brother
36 Sister - shared father
37 Brother - shared father
39 Other - biological father’s relative
41 Step-father
42 Foster father
43 Adoptive father
44 Paternal GM’s sister (Great Aunt)
45 Paternal GM’s brother (Great Uncle)
46 Paternal GF’s sister (Great Aunt)
47 Paternal GF’s brother (Great Aunt)
51 Female significant other => 1 yr with family
52 Male significant other => 1 yr with family
53 Female SO < 1 yr with family
54 Male SO < 1yr with family
55 Full bio sister
56 Full bio brother
57 Step-sister (unrelated to either of proband’s BIO
parents)
58 Step-brother (unrelated to either of proband’s
BIO parents)
59 Random relative NOS
78 Female nonrelative NOS
79 Male non0relative NOS
96 Missing
97 N/A
98 Unknown
99 Refused
UTILITY OF THE BGII 115
Appendix E: Consent Forms
U
U
TILITY O
F
F
THE BGII
116
U
U
TILITY O
F
F
THE BGII
117
U
U
TILITY O
F
F
THE BGII
118
U
U
TILITY O
F
F
THE BGII
119
UTILITY OF THE BGII 120
Appendix F: Statistical Findings
Statistical findings /SPSS 16.0
Sample Demographics.
Age, years
Median
Mean
SD
Minimum-Maximum
12
11.92
2.56
8-17
Number of Diagnoses
Median
Mean
SD
Minimum-Maximum
3
2.76
1.42
0-8
Gender, n (%)
Boy
Girl
48 (64)
27 (36)
Ethnicity, n (%)
White
Black
Other
8 (11)
63 (84)
4 (5)
UTILITY OF THE BGII 121
Author’s Note
This work is in completion of a Ph.D. dissertation for Linda R. Marnic, Department of
Counseling Psychology at West Virginia University. Professional colleagues that were
integral to this work include Eric Youngstrom, Department of Psychology, University of
North Carolina at Chapel Hill; Oren Meyers, Department of Psychiatry, Case Western
Reserve University, Cleveland, Ohio; Andrew Freeman, University of North Carolina at
Chapel Hill; Heather Marcinek, Case Western Reserve University; Frank Ezzo,
Department of Psychology, Applewood Centers, Cleveland, Ohio.
The research was supported in part by a grant funded by the National Institute of Mental
Health in coordination with a study of Bipolar Disorder in Children.
Correspondence regarding this research may be sent to Linda R. Marnic, who is now in
private practice at Family Matters, P.O. Box 490, Lost Creek, West Virginia 26385
John H.
Hagen
Digitally signed by John H. Hagen
DN: cn=John H. Hagen, o=West
Virginia University Libraries,
ou=Acquisitions Department,
c=US
Date: 2010.12.08 15:41:13 -05'00'