Dependence Of Cognitive Skills Development on Support Learning Inquiry

Contributed by:
The argument for inquiry learning as an educational tool is being heard increasingly, especially as the technology and materials to support this kind of educational experience have expanded and become widely available.
1. The Development of Cognitive Skills to Support Inquiry Learning
Author(s): Deanna Kuhn, John Black, Alla Keselman, Danielle Kaplan
Source: Cognition and Instruction, Vol. 18, No. 4 (2000), pp. 495-523
Published by: Taylor & Francis, Ltd.
Stable URL: .
Accessed: 16/09/2011 15:15
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected]
Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to Cognition and
? 2000,Lawrence
Copyright Erlbaum Inc.
The Developmentof CognitiveSkillsTo
DeannaKuhn,JohnBlack, Alla Keselman,Danielle Kaplan
Establishing thevalueof inquirylearningasaneducational method,it is argued,rests
on thorough,detailedknowledgeof the cognitiveskillsit is intendedto promote.
Mentalmodels,as representations of therealitybeinginvestigated in inquirylearn-
ing,standtoinfluencestrategies appliedtothetask.Intheresearch described here,the
hypothesis investigated thatstudents atthemiddle schoollevel,and sometimes well
beyond,mayhaveanincorrect mentalmodelof multivariable causality(oneinwhich
effectsof individualfeatureson anoutcomeareneitherconsistentnoradditive)that
impedesthecausalanalysisinvolvedinmostformsof inquirylearning.Anextended
intervention with6thto 8thgraders wastargeted topromote(a)atthemetalevel,a cor-
rectmentalmodelbasedonadditiveeffectsof individual features(indicated byiden-
tificationof effects of individualfeaturesas the task objective);(b) also at the
metalevel,metastrategic understanding of theneedto controltheinfluencesof other
features;and(c)attheperformance level,consistent useof thecontrolled comparison
strategy.Both metalevel advancements were observed,inaddition totransfertoanew
taskattheperformance level,amongmany(thoughnotall)students. Findingssupport
theclaimthata developmental hierarchy of skillsandunderstanding underlies,and
shouldbe identifiedas anobjectiveof, inquirylearning.
The argumentfor inquirylearningas an educationaltool is being heard increas-
ingly, especially as the technologyandmaterialsto supportthiskind of educational
experience have expandedand become widely available.Amidst the widespread
enthusiasm,the strongestcriticismto be heardis thatsuch methodsareinefficient.
Too little substantiveknowledge is gained to justify the sizable expenditureof
classroomtime that such activities typically consume.But outweighingthis criti-
cism in a majorityof educators'eyes arethe potentialbenefits of the opportunities
Requests should
besenttoDeanna Box119,Teachers
Kuhn, College, Univer-
sity,NewYork,NY 10027.E-mail:
[email protected]
affordedstudentsto engage in genuineinquiry.Highly favoredin a recentNational
Research Council report (Bransford,Brown, & Cocking, 1999) is a method in
which students
analyzedataandconstruct evidence.... Theythen
debatetheconclusionthattheyderivefromtheirevidence.Ineffectthestudents build
andargueabouttheories.... Questionposing,theorizing, andargumentation formthe
of thestudents'scientificactivity.... Theprocessas a wholeprovide[s]a
richer,morescientificallygrounded experience focusontext-
booksorlaboratory demonstrations.(pp.171-172)
In formulatingquestions,accessing and interpretingevidence, and coordinatingit
with theories,studentsarebelievedto developthe intellectualskills thatwill enable
them to constructnew knowledge (Chan,Burtis, & Bereiter, 1997). In addition,
they ideally arealso acquiringa set of intellectualvalues-values thatdeem activi-
ties of this sort to be worthwhilein generaland personallyuseful. In the words of
Resnick andNelson-LeGall (1997), studentswho value intellectualinquiry
believetheyhavetheright(andtheobligation) to understand thingsandmakethings
work... believethatproblemscanbe analyzed,thatsolutionsoftencomefromsuch
analysisandthattheyarecapableof thatanalysis... havea toolkitof problem-analy-
sis toolsandgoodintuitionsaboutwhento usethem... knowhowto askquestions,
seekhelpandget enoughinformation to solveproblems... havehabitsof mindthat
leadthemto activelyuse thetoolkitof analysisskills.(pp.149-150)
In short,studentscome to understandthatthey are able to acquireknowledgethey
desire, in virtuallyany contentdomain,in ways thatthey can initiate,manage,and
execute on theirown, andthatsuchknowledgeis empowering.This outcomeis be-
lieved to justify the time devoted to developmentof these skills and dispositions
within the context of what is typically a circumscribedtopic of investigation.
Is inquiry-basededucationcapableof deliveringon these promises?We argue
here thatthe argumentssupportingits meritsrest on a criticalassumption.The as-
sumptionis thatstudentspossess the cognitive skills thatenablethemto engage in
these activities in a way thatis profitablewith respectto the objectives identified
previously. If studentslack the necessary skills, inquirylearningcould in fact be
counterproductive,leading studentsto frustrationand to the conclusion that the
world, in fact, is not analyzableandworthtryingto understand-a conclusionthat
runs exactly opposite to the intellectualvalues that Resnick and Nelson-LeGall
(1997) arguedinquirylearningshouldpromote.
At this point, it is necessaryto become specific as to whatwe arereferringto as
inquiry learning because a wide range of educationalpractices have been de-
scribedunderthis heading.Here,we define inquirylearningas an educationalac-
tivity in which students individually or collectively investigate a set of
phenomena-virtual or real-and drawconclusions aboutit. Studentsdirecttheir
own investigatoryactivity,but they may be promptedto formulatequestions,plan
their activity, and draw andjustify conclusions about what they have learnedde
Jong and van Joolingen (1998).
Inquiryactivitiestargetedto young childrenmay have simple goals thatdo not
extendbeyond description,classification,or measurementof familiarphenomena.
More typically, however, inquiryactivitiesare designed for olderchildrenor ado-
lescents and have, as their goal, the identificationof causes and effects. The con-
text is typically a multivariableone, such thatthe goal becomes one of identifying
which variableor variablesare responsiblefor an outcome or how a change in the
level of one variablecauses a change in one or more othervariablesin the system.
Equallyimportantis the identificationof noncausalvariables,so thatthese can be
eliminatedas sources of influence in understandinghow the system functions.
Are studentsof the elementaryand middle school grades(in which inquiryac-
tivities are most commonly introduced)capableof inferringsuch relationsbased
on investigations of a multivariablesystem? There exists little educationalre-
searchon studentsengagedin inquirylearningthatwould answerthis questiondi-
rectly. Evidence that is available, on the other hand, from the literatureon
scientific reasoning suggests significant strategicweaknesses that have implica-
tions for inquiryactivity (Klahr,2000; Klahr,Fay, & Dunbar,1993;Kuhn,Amsel,
& O'Loughlin, 1988; Kuhn, Garcia-Mila,Zohar, & Andersen, 1995; Kuhn,
Schauble, & Garcia-Mila, 1992; Schauble, 1990, 1996). Strategies, moreover,
even thoughthey have been the focus of attentionin scientific reasoningresearch,
may not be all, or even the most criticalelement,thatis missing. In this article,we
raise the possibility that studentsat the middle school level, and sometimes well
beyond, have an incorrectmentalmodel thatunderliesstrategicweaknesses, and
thatimpedes the multivariableanalysis requiredin the most common forms of in-
quirylearning.Like manymentalmodels, this model maybe resistantto revision.
Numerouslines of cognitive andcognitively orientededucationalresearchempha-
size mental models as vehicles that studentsemploy in coming to understandthe
workings of a system (Gentner& Stevens, 1983; Vosniadou & Brewer, 1992).
Such models facilitate(or sometimes interferewith) understandingof how a sys-
tem operates.We use the mentalmodel terminologyhere, however, in a more ge-
neric sense. It is students'mentalmodel of causalityitself, we claim, that may be
deficient, ratherthana mentalmodel of the workingsof any particularcausal sys-
tem. This incorrectmentalmodel can be contrastedto a normativeanalysisof vari-
ance (ANOVA) model of causalityin amultivariablesystem-a model in which in-
dividualvariableseach manifesttheirindividualeffects on one or more dependent
variables.Sucheffectsarenormallyadditive,althoughone effectmay in some
casesinfluence(interactwith)theeffectof anothervariable.
If we expectstudentsto understand the operationof a multivariablesystem,
they must at leastunderstand theconceptof additiveeffects-effects thatoperate
individually ona dependent variablebutthatarecumulative (additive)intheirout-
comes.A studentwhopossessesthismentalmodelof additiveeffectscanunder-
standmuch abouta systemand in manycases even predictoutcomesfairly
accurately withoutthemoresophisticated conceptof interactioneffectsas partof
this model.The deficientmentalmodelwe describehere,in contrast,is one in
whichneitheradditivenorinteractive effectsareunderstood in a normative way.
Thespecificsituationwe refertohereinconsidering thesementalmodelsis one
in whichanoutcomevariablethatcanassumemultiplelevelsonatleastanordinal
scale(i.e.,orderedfromlesstomoreof somequantity) ispotentiallyaffectedbyaset
of independent each
variables, of whichcanassumetwodifferentlevels.Forexam-
ple, in theworkdescribedhere,thevariablesof soiltype(sandvs. clay),elevation
(highvs. low),andwaterpollution(highvs. low)areamongfivepotentialfeatures
canassumefivedifferentlevels,fromlowflooding(1 ft)tohigh(5ft).Toinvestigate
thesystem,a studenthastheopportunity tochoosedesiredlevelsforeachofthefea-
turesand,oncethisis done,toobservetheresultingoutcome.Thetaskpresented to
thestudentis to findoutwhichfeaturesmakea differenceandwhichdonotmakea
differencein determining thelevelof theoutcomevariable.
Studentsbeginningto investigatesucha systemoftenfocusexclusivelyon out-
comes-achieving those deemeddesirableand avoidingundesirableoutcomes
(Kuhnet al., 1995;Kuhnet al., 1992;Schauble,1990;Schauble,Klopfer,&
Raghavan,1991).To makeprogressbeyondan outcomefocus,it is necessaryto
shiftone'sattentionto whatwe cancallananalysisfocus-specifically,analysis
in termsof theeffectsof individualfeatures.Withouttheunderstanding thatindi-
vidualfeatureswill contribute theirrespectiveeffectsto theoutcomes,thesystem
cannotbe analyzedandunderstood.
Considernow the mentalmodel that might characterizethe thinkingof
sixth-grader Matt(anactualcasefromthedatabaseof theresearchdescribedhere,
although student'snameis changed).Welabelthefivevariablefeaturesof the
systemby numberandtherespectivelevelsof eachfeatureby thelettersa orb.
Matt makes the following claims. Based on observationof the instance
la2a3a4a5ain conjunction witha positiveoutcome(01), Mattconcludesthatall
of thesecontributed to thegoodoutcome(thesiteis minimallyflooded).Inother
words,the sandysoil, the lackof pollution,the highelevation,andso forth,"all
makea difference,becauseit cameoutgood."Next,Mattexaminesthe instance
lb2b3b4a5a(i.e., the levels of threeof the featuresarechangedfromwhatthey
werein the firstinstanceandremainthesamefortheothertwo features)andob-
servesa pooroutcome(highflooding).Thistime,Mattsays,"Noneof themmade
Soil: Sand Soil: Clay
WaterTemperature:Hot WaterTemperature:Hot
FIGURE 1 The co-occurrencementalmodel. Bothfeaturesareimplicatedas causalinthe out-
come on the left and not implicatedin the outcome on the right.
a difference-it came outbad."We can inferfromthese statementsthatMattis not
using the expressionmakea differencein the normativeway dictatedby the analy-
sis model. Instead,making a difference appearsto mean "helping to produce a
good outcome."
Such a model of multivariablecausalityaccommodatesthe seemingparadoxof
a variablemakinga differenceon some occasions (when the outcome is good) and
not makinga differenceon others (when the outcome is poor)-a state of affairs
that we in fact have found to be common among, and not at all paradoxical,for
manychildrenof this age. In earlierwork, for example,childrenof Matt's age who
observedthatsportsballs with a certaintype of surfaceproducea good serve half
of the time and a poor serve half of the time, whereasballs with a differentsurface
type producethe same results, often failed to make the normativeinference that
type of surfacewas noncausalwith respectto this outcomevariable.Instead,they
concludedthatthe surfacetype "sometimesmakes a difference"in the qualityof
the serve (Kuhnet al., 1988).
Formalizingthis mentalmodel, it can be describedas stipulatingthe co-occur-
rence of a particularvariablelevel andan outcomeas a sufficientconditionfor im-
plicating thatvariableas having played a role in the outcome (or, in the case of a
negative outcome,excludingthe variableas havingplayed a role). We referto this
mentalmodel as a co-occurrencemodel.
It is importantto note thatthe variablelevel, not the variableitself, is implicated
as causalin the co-occurrencemodel. In the depictionin Figure 1, for example,it is
the featurelevels sandy soil and hot water(ratherthansoil type or watertempera-
ture, as features)that are implicatedas causal in interpretingthe successful out-
come on the left-handside of the figure.In interpretingthe unsuccessfuloutcome
on the right,the same watertemperatureis, this time,judged not to make a differ-
ence. Reflecting anotherformof inconsistency,ratherthansoil type makinga dif-
ference, sand does (but clay does not) make a difference.
Causalattributions,then, fluctuateas functionsof the particularconstellationof
feature levels that are present in a particularinstance. Each constellation is a
uniqueevent (even thoughits componentsmay be incompletelyidentified).Rather
thanrepresentinga genuineinteractivemodel, however, the co-occurrencemodel
reflects failureto conceptualizeeven the main effects (of featuresas variables)on
which statisticalinteractioneffects are founded.
It shouldbe noted finally thatin additionto being inconsistent,effects of indi-
vidual features are not additive in the co-occurrencemodel. Because co-occur-
rence of a particularfeature level and an outcome is a sufficient condition for
attributingcausality,any co-occurringfeaturelevel may be implicatedin what is
regardedas a successful outcome.Implicationof more co-occurringfeaturelevels
might be expected to produce an even more successful outcome-yet even one
co-occurringfeatureis sufficientto explain even the most successful outcome.
Mentalmodels, as noted previously,may be resistantto change, and it is not clear
whatthe most effective way mightbe to effect a transitionfroma co-occurrenceto a
genuine analysis model of multivariablecausality.In previousresearch(Kuhnet
al., 1995;Kuhnet al., 1992),we focusedon the investigatorystrategiesstudentsuse
andthe resultingvalidityof theirinferences.To make a valid inference,it is neces-
saryto make a controlledcomparisonbetween two instancesthatdiffer only with
respectto a single featurethatis the focus of analysis.In researchon scientificrea-
soning, the lion's shareof attentionhas gone to this controlledcomparison,or "all
otherthings equal"investigationstrategy,as the hallmarkof skilled scientific rea-
soning (DeLoache,Miller, & Pierroutsakos,1998;Klahr,2000; Kuhnet al., 1988;
Zimmerman,2000). The investigatorneeds to recognize that to conduct a sound
test ofthe effect of one variable,all othervariablesmustbe held constant,so thatthe
effects of these othervariablesdo not influencethe outcome.
In ourresearch(Kuhnet al., 1995;Kuhnet al., 1992), we have foundthatuse of
a controlled comparisonstrategyand the valid inferences that result from it in-
crease in frequencyover a periodof monthsamongpreadolescentswhen they are
given the opportunity to engage in self-directed investigatory activity of a
multivariablesystem. Some students,however,even aftermany weeks of investi-
gation, remainstubbornlyfixed at a level of confoundedinvestigationsand falla-
cious inferences.The mentalmodel ideas proposedhere suggest a possible reason
for their lack of progress.
The analysis model of additiveeffects of individualvariablesis a logical pre-
requisiteto the controlledcomparisoninvestigativestrategy.This is so becausethe
purposeof the latteris identificationof the effect of a single variable.If one's men-
tal model is not one of individual additive effects, neither attributeof the con-
trolled comparison strategy is compelling. The "comparison"attributeis not
compelling, given that it entails comparingthe outcomes associatedwith two (or
more) levels of a variablefor the purposeof assessing the effect of thatvariableon
outcome. Furthermore,the "controlled"attributeis even less compellingbecause
it is the individualeffects of othervariablesthatneed to be controlled.As we sug-
gested previously, then, an incorrectmental model may underlie the strategic
weaknessesthathave been observedandimpedethe multivariableanalysiscentral
to inquirylearning.
As a procedure,the controlledcomparisonstrategyis straightforwardto teach
("Keepeverythingelse the same andjust changeone thing").By comparison,it is
not easy to change mentalmodels, and this would seem particularlyso of the sort
of generic model (of multivariablecausality)that we discuss here. A numberof
studiesover the years have undertakenteachingthe use of the controlledcompari-
son procedurein brief trainingsessions (Case, 1974; Chen & Klahr, 1999) with
some degree of success, but such interventionsareunlikelyto effect changein un-
derlyingmentalmodels of causality.
In our research(Kuhn& Angelev, 1976; Kuhn,& Ho, 1980; Kuhn,Ho, & Ad-
ams, 1979;Kuhn& Phelps, 1982;Kuhnetal., 1988;Kuhnetal., 1995;Kuhnetal.,
1992), we have focused on longer term interventions(typically 8-10 weekly ses-
sions), with an objectiveof promotingnotjust changein the strategiesstudentsuse
to acquire new knowledge about a causal system (referredto later as knowing
strategies), but enhancementof theirmetastrategicunderstandingof why these are
the strategiesthatmust be used and why otherswill not suffice. Executionof the
controlledcomparisonstrategy,as just noted, is relatively easy to teach, but it is
metastrategicunderstandingthatdetermineswhetherthe strategywill be selected
when the studentis engaged in self-directedactivity (Kuhn,in press-c).
The argumentwe make here is thatthis metastrategicunderstandingrequiresa
correctmental model of how a multivariablecausal system (again, in the generic
sense of anycausalsystem)operates.A strategythathasthepurposeof assessingthe
effect of an individualfeaturewill notbe understoodandvaluedunlessone's mental
model of the operationof a multivariablesystem is based on the additiveeffects of
individualfeatures.Oncethisanalysismentalmodelof individualadditiveeffects is
attained,the learneris in a positionto proceedto a morecomplex analysismodel in
which these individualeffects areinteractivein theirinfluenceon outcomes.In the
absence of this analysismentalmodel in which individualvariablesasserttheirre-
spective effects on an outcome in an additivemanner,the controlledcomparison
strategyfor assessing these effects can be taught,but its logic will not be compel-
ling-there will not be a deep level of understandingas to why it must be used.
Onewaytoformalizethisdeeplevelofunderstanding asaconstruct is topostulatea
metalevelof operationthatis distinctfromtheperformance level (Figure2). The
knowingstrategiesdepictedinFigure2 arethosewe regardascentralto inquiryac-
tivity.Themetalevelis thelevelatwhichparticular knowingstrategies areselected
foruseandtheirapplication monitored andtheresultsinterpreted (left-hand sideof
Figure2).Understanding why tousea strategy, occurs
then, atthemetalevel. More-
over, it is thismetalevel
understanding thatshould not
govern only the useof a strat-
egy but its to
generalization a new context in which it is applicable(Crowley&
Metalevelunderstanding, we canhypothesize,developsin parallelwithstrate-
in a
gic competence mutuallyfacilitativerelation.Exerciseof strategiesattheper-
formancelevel feeds backandenhancesthe metalevelunderstanding thatwill
guide subsequentstrategyselectionand,hence,performance. In otherwords,
metalevelunderstanding bothinformsandis informedby strategicperformance
(Figure2; see also Sophian,1997).
Strategiesexistonlyin relationto goalsorobjectives.Therefore, metalevelun-
derstandingof task objectives (metataskunderstanding)is as critical as
metastrategic of thestrategiesthatareavailableto applyto thetask
(Kuhn& Pearsall,1998;Siegler& Crowley,1994).Bothmustbe presentandco-
Competence I Disposition
to apply to apply
.. tre
INQUIRY he.......
to findout?
ANALYSIS ............... Can
analysisbe Knowing:
Meta-level worthwhile? Declarative
Procedural INFERENCE ................ Areunexam- Whatis knowing?
Whatdoknowing worthhaving? Facts
strategies ..Op
accomplish? ARGUMENT............ Is there Opinions
a point Claims
When,where, to arguing? \
whyto usethem? Theory- Evidence
FIGURE 2 Phases of inquiryactivity, with hypothesizedbidirectionalrelationsbetween the
metalevel and the performancelevel.
Note. From"HowDo PeopleKnow?"by D. Kuhn,in press,Psychological Science.Copyright
1999 by Blackwell. Reprintedwith permission.
ordinatedto guideperformancesuccessfully.The mentalmodel of additiveeffects
of individualvariables,we have claimed,is essentialfor the controlledcomparison
investigative strategy.We can now elaboratethat specifically it is necessary to
metataskunderstandingof the task objective of identifying effects of individual
variables. Without this understanding,the appropriatecontrolled comparison
strategywill not be consistentlyselected.
In the researchpresentedin this article, we examine the extent to which the
mentalmodel transition(from an incorrectto correctmodel of multivariablecau-
sality) thatis discussedhere is facilitatedby metalevelexercise thatoccursin addi-
tion to and in conjunctionwith performance-levelexercise of strategies.In past
work, we have undertakento promotethe developmentof metalevel understand-
ing by externalizingit in collaborativediscussion among peers, a method that
works under certainconditions (Kuhn, in press-c). Anothermethod is to engage
studentsmore directlyin metalevel exercise by asking them to evaluatedifferent
potentialstrategiesthatcould be appliedto a problem.The contemplationof alter-
native strategiesshould promotenot only attentionto task objectivesbut also the
essential task of coordinatingtask objectiveswith availablestrategies.This direct
approach,we have found, also meets with some success (Pearsall, 1999).
It is this latterapproachthat is used in the work presentedhere, but we do so
with a particularfocus on the questionof whetherit will promotethe transitionto
the more correctadditivemental model of causality.As partof the metastrategic
evaluation exercise, studentsare presentedthe situationof two individualswho
disagree as to the effect of a particularfeaturewith one individual,for example,
claiming that soil type makes a differenceand the otherclaiming that it does not.
The studentsmust then considerand evaluatethe strategiesthat could be used to
resolve the conflict. Note thatthe conflict is explicitly identifiedas one aboutthe
effect of a particular,individualfeature.To whatextent,we asked,would extended
experience with the evaluationof such conflicts promote (a) at the metalevel, a
mentalmodel based on the effects of individualfeatures,reflectedin metataskun-
derstandingthat the object of the activity is identificationof effects of individual
features;(b) metastrategicunderstandingof the need to controlthe influences of
other features(the controlledcomparisonstrategy);(c) at the performancelevel,
successful use of the controlledcomparisonstrategy;(d) resultingvalid inferences
regardingthe status of causal and noncausalfeaturesin the system; and (e) supe-
rior acquisition of knowledge about the system, reflected in correctconclusions
aboutits causalstructure.Ourpast researchindicatedthatperformance-levelexer-
cise of investigative activity (with no feedbackbeyond that providedby the stu-
dent's own activity)over a periodof weeks is sufficientto induce some change on
at least some of these dimensions among a majorityof students.We, therefore,
compare two conditions: one in which students engage only in this perfor-
mance-level exercise and anotherin which studentsalso engage in the metalevel
exercise, describedmore fully subsequently.
11. 504 KAPLAN
Participantswere 42 middle school (6th, 7th, and 8th grade)studentsattendingan
urbanpublic school. They came from two comparableintact science classes of
mixed-grade(6th-8th) level. Eachclass participatedover the same several-month
periodas partof theirscience curriculum.One class was arbitrarilychosento serve
as an experimentalgroup,andthe otherclass servedas a controlgroup.The former
groupconsistedof 10 boys and 11 girls, andthe latterhad 12 boys and9 girls. Stu-
dentswere of diverse ethnicity,with the majoritybeing AfricanAmericanor His-
Task Environment
The main task, which studentsengagedrepeatedlyboth individuallyand in dyads
duringthe course of the study, is a multimediaresearchprogram,createdwith the
MacromediaDirectorauthoringtool. The programsupportsself-directedinvesti-
gationof a multivariableenvironmentconsistingof a set of instancesavailablefor
investigation,with instancesdefinedby five variablefeaturesandanoutcome-the
degree of flooding of a buildingsite.
Studentsare placed in the role of buildersworkingfor TC ConstructionCom-
pany, which builds cabins along the shore of a series of small lakes. The area is
susceptibleto flooding, and the cabins are, therefore,built on supportsthat raise
themabove the ground.It is the student'staskto identifythe optimumheightofthe
supportsfor variousbuildings.It is explainedin the introductoryonline presenta-
tion thatthe supportsshouldbe neitherhigherthannecessaryto avoid unnecessary
buildingexpense nor lower thannecessaryto avoid flooding andresultingdamage
to the building. Studentsare given a bank accountat the beginningof theirwork,
with money subtractedfor incorrectpredictions(of how much flooding will occur
at that site and, therefore,how high the supportsneed to be built) and a bonus re-
ceived for correctpredictions.
The only way for studentsto generatecorrectpredictionsis to investigateef-
fects of the five variablefeatureson amountof flooding and drawappropriatein-
ferences. Following an introductorysession in which the programis introduced
and students'initial beliefs assessed regardingthe five variablefeaturesthatmay
influence flooding, the studentembarkson a series of investigatorysessions. The
programincludes the following sequence of activities:statementof investigatory
intent(studentsindicatewhich featuresthey intendto find out about),selection of
featurelevels in instancesto be examined,predictionof outcomes,the opportunity
to make inferencesandjustify them, and the option of makingnotes in an online
notebook.Duringthe second andsubsequentsequences,the featurelevels andout-
Causal Structureof FloodProblem
Waterpollution(highorlow) No effect
Watertemperature (hotorcold) Coldraisesthefloodlevel 1 ft
Soildepth(deeporshallow) Shallowraisesthefloodlevel2 ft
Soiltype(clayorsand) Sandreducesthefloodlevel 1 ft fordeepsoil only
Elevation(highorlow) No effect
comeof the immediately precedinginstanceremainvisibleto facilitatecompari-
sons. The sequenceis repeatedfive timesduringeach session.At the end of a
session,studentsareaskedto drawconclusionsaboutthecausalandnoncausalef-
fectsoperatingin thesystem.Students'activitywithintheprogramis trackedand
recordedintowordprocessingfilesby theprogram.
Thecausalstructure of the taskenvironment is shownin Table1. Two of the
five featuresarenoncausal(i.e.,haveno effecton outcome).Theotherthreefea-
turesarecausal,withaninteractive effectbetweentwo features.
A secondtaskwas employedas a transfertask,to assess the generalityof
changesin students'strategiesandunderstanding asa functionof theirworkonthe
maintask.Thetransfertaskwas identicalto themaintaskin structure andcom-
puterinterface.Thecontentinvolvedtheeffectsof variousfeaturesonjob appli-
cants'potentialeffectivenessas a teacher'saidein a classroom.
Pretest assessment. Studentsfrombothclassesparticipated in individu-
allyadministered pretests.Followingintroduction of theprogramandassessment
of initialbeliefs,theinitialinvestigatory
studentrepeatedtheinvestigatory cycle(selectionsof featurelevels,predictionof
outcome,inference,andjustification) fivetimes.Students workedone-on-onewith
a researcher that
during session, so that or
anyquestions misunderstandings could
be addressed.An identicalpretestassessmentwas administered for the transfer
Performance-level exercise. A 2-week school vacation intervened be-
tweencompletionof pretestassessments andcommencement of themainphaseof
thestudy.Duringthatphase,participantsworkedinchangingdyadsina seriesof 9
to 10sessionsthattookplaceovera periodof roughly6 weeks,withanaverageof
two sessionsperweek(anda rangeof 1-3, dueto absencesandschedulingcon-
straints). of avoiding,as
faras possible,pairingof thesametwo studentsformorethanonesession.At the
beginningof the pairsessions,studentswereinstructedto workcollaboratively
ratherthaninturnsto discusstheirviewsasto howtoproceedorwhatto conclude,
13. 506 KAPLAN
Sample MetalevelExercise
Thisis TerryandJamie'swork:
T sy sil a Shallows
Polltioe: at r Hig
. '!l a
ligh pthd: si hallow
Terrysayssoil typedoesmakea difference.
Whatcan settle the argumentbetweenthem?
Do the records they looked at say anythingabout whethersoil type does or does not make a
difference?(circle one)
Yes No
Whatdo the records suggest?
Soiltypemakesa difference
Soiltypedoesnotmakea difference
Whatwas differentabout this record and the last record they looked at?
Weretheydifferenton soil type? Same Different
Weretheydifferenton waterpollution? Same Different
Weretheydifferenton watertemperature? Same Different
Weretheydifferenton soil depth? Same Different
Weretheydifferenton elevation? Same Different
Did the two records have differentamountsofflooding? (circle one or more)
Becauseof soil type
Becauseof waterpollution
Becauseof watertemperature
Becauseof soil depth
Becauseof elevation
Do the records they looked at say anythingabout whethersoil type does or does not makea
difference?(circle one)
Yes No
Whatdo the records suggest?
Soiltypemakesa difference
Soiltypedoesnotmakea difference
TABLE2 (Continued)
Whatgrade wouldyou give Terryand Jamie on their work?(circle one)
Whydo they deserve this grade?
Andtheywantedto findoutFORSUREif soiltypemakesa difference.
Whatrecord should they look at next, to be sure? (Circle your choices.)
If the second record comes out differentfrom thefirst, what will the reason be?
and not to proceed until some agreementwas reached.At each session, the pair
workedcollaborativelyon the flood task,with an adultavailablefor consultationif
problemsarose, but the adultotherwisedid not intervene.
Metalevel exercise. In addition,studentsin the experimentalconditionen-
gaged in a series of paper-and-pencil
exercisesrelatedto the flood task,which they
workedon in pairswithintheclassroom,twice eachweek forthedurationofthe period
thattheywereworkingon thefloodprogram.Pairingvariedacrossoccasions,andstu-
dentswereinstructedto worktogetherandagreeon an answerbeforewritingit down.
Studentscompletedone exercisepersession.A sampleexerciseis shownin Table2. In
thatexample,the comparisonis confounded(therecordshowndiffersfromthe previ-
ousrecordwithrespectto two features)andtheoutcomevaries.Inothercases,thecom-
parisonwas controlledandthe outcomeseithervariedor remainedconstant.
Posttest assessment. Theposttestassessment
andduplicatedthe pretestassessment.Posttestassessmentstookplace duringthe 2
weeks following completionof the interventionperiod.
Delayedposttestassessment of metalevelunderstanding.Approxi-
mately 1 week following the completionof posttest assessments,a pa-
per-and-pencil measurewas administered duringclass time by the classroom
teacherineachof theclasses.Theresearchers werenotpresentduringthisadmin-
istration.Onestudentin theexperimental conditionand5 studentsin thecontrol
conditionwereabsenton the administration dayanddidnotreceivethisassess-
Thismeasurewas designedto assessmetataskunderstanding of the taskgoal
(identifyingeffects of individual and
features) metastrategic understanding of the
criticalstrategy(controlledcomparison) thatallowed thisgoalto be met. To serve
as themostrigoroustestof understanding, thismeasurewasbased on the content
of thetransfer(teacheraide)taskratherthanonthecontentof themaintask(used
in theintervention activities).Thestudentwasaskedwhichof tworecordswould
be thebetteroneto lookatnext:Pat'schoice(whichrepresented a controlled com-
parisonrelativeto theinitialrecordavailable)orLee'schoice(whichrepresented a
confoundedcomparison withrespectto two features).Thestudentwas askedto
justifywhy thiswas "abetterplanfor findingout."In addition,the studentwas
Predictionerror. A quantitative measureof performance is thedegreeof er-
rorinpredicting outcomes.Averageprediction errordecreasedfrom1.23errorsat
thepretestto0.96errorsattheposttest(withoneunitof errorequalingamismatch of
1 ft.betweenthepredictedlevelof floodingandtheactuallevel).Thisdeclinewas
significant,F(1,40)=4.54,p= .039,anddidnotdifferbyexperimental condition.
Meanpredictionerroron thetransfertasksimilarlydecreasedfrom1.05atthe
pretestto 0.74attheposttest.Thisdifferencewasalsosignificant,F(1, 40) = 7.87,
p .008, and did not differby experimentalcondition.Thus, students in both
groups learned something aboutthecausalsystemthatwas observable in theirper-
Valid inference. A more qualitativepictureof performanceis providedby
of controlledcomparisonis not straightforward to assess because studentsdid not
always makethe appropriatecomparisons,even when they hadselected for exami-
nationdatathatwould allow themto make an informativecomparison.Therefore,
we were conservativein assessmentof use of the controlledcomparisonstrategy,
judging it presentonly when studentsdrewajustified inference,thatis, drewa cor-
rect conclusionbased on comparisonof two instancesthatthey had generatedand
that they referredto in justifying the conclusion.
The numberof inferencesjustified by an appioptiatecontrolledcomparisonof
two instances (henceforthcalled valid inferences)was examinedrelativeto num-
ber of possible inferences.This proportionof valid inferenceswas calculatedfor
each studentfor the main and transfertasks at pre- and posttest assessments.As
seen in Table 3, patternsare similarfor the two tasks. Studentsin both conditions
show a low level of valid inferenceat the pretest,and both groupsshow improve-
ment from pretest to posttest, with the experimentalgroup showing somewhat
greaterimprovementthanthe controlgroup.The proportionssummarizedin Table
3 were subjectedto arcsinetransformationand analyzedby a repeatedmeasures
ANOVA with time oftesting a within-subjectsfactorandexperimentalconditiona
between-subjectfactor.For the main task,time of testingwas significant,F(l, 41)
Proportionof ValidInferences
Group Pretest Posttest
Main task
M .06 .45
SD .11 .42
M .12 .33
SD .19 .42
Total groupb
M .09 .39
SD .15 .42
M .00 .43
SD .00 .51
M .10 .29
SD .30 .46
Total groupb
M .05 .36
SD .26 .48
aN= 21; bN= 42.
MeanNumberof Inferencesper InstanceExamined
Group Pretest Posttest
Main task
M 3.77 3.33
SD 1.32 1.66
M 3.98 4.02
SD 1.11 1.32
Total groupb
M 3.88 3.67
SD 1.21 1.51
M 3.86 2.64
SD 1.57 1.92
M 3.64 3.50
SD 1.57 1.86
Total groupb
M 3.75 3.07
SD 1.55 1.92
aN= 21; bN= 42.
= 20.58, p < .001, but neitherconditionnor the interactionof time and condition
reachedsignificance.Forthe transfertask,time of testingwas significant,F(1, 41)
= 19.2l,p < .001, andthe Time x Conditioninteractionwas marginallysignificant,
F(1, 41)= 2.84,p = .10.
A decline in the number of inferences made also reflects improvedperfor-
mance.A studentwho declines to makean inference(choosingthe "haven'tfound
out" option) recognizes that the evidence he or she has generateddoes not allow
for a definitive conclusion. The averagenumberof inferencesmade per session
was overallslightlybelow four(of a possible five). As seen in Table4, this number
declinednoticeablyonly amongthe experimentalgroupandmore so on the trans-
fer task than the main task. A repeatedmeasuresANOVA yielded no significant
effects for the main task. On the transfertask, however, the interactioneffects of
both time, F(l, 41) = 7.01,p =.01, andTime x Condition,F(l, 41) = 4.37,p =.04,
were significant.
Some additionalinsight is gained by qualitativeexaminationof patternsof
changefrompretestto posttest.These aresummarizedin Table5, which shows the
distributionof studentsshowing no valid inference,a mixtureof valid and invalid
inference,and all valid inferenceat the two times for the maintask.As seen in Ta-
ble 5, the majorityof studentsshow no valid inferenceat the pretest,andjust less
thanhalf do not improvein this respect.Improvement,however, is more frequent
in the experimentalgroup.Mixtureof valid andinvalidinferenceis a commonpat-
tern at both times, consistent with previous research (Chen & Klahr, 1999;
Crowley& Siegler, 1999;Kuhnet al., 1995). Results for the transfertask aresimi-
lar, with slightly lower frequenciesof valid inferenceusage at the posttest (9 stu-
dents in the experimentalgroup and 6 in the control group showing some or all
valid inferences).
Understanding inferred from performance. An indirect measure of stu-
dents' understandingof the task objective is providedby their responses to the
queryregardingwhich featuresthey intendedto find out about,posed at the begin-
ning of each investigativesequence.Did studentsunderstandthe need to focus their
investigativeeffortson a single featureat a time?If so, thisunderstandingshouldbe
reflectedin answersto this question.A declinein the numberof featuresforwhich a
studentexpressed an intent(to investigate)in examininga single instanceof evi-
dence should reflect increasedunderstandingof the need to focus on single fea-
tures.Therefore,we comparedmeannumberof intents(to investigatea feature)per
instanceat pretestand posttest assessments.
These means are shown in Table 6 for the two conditionsand times of testing.
As seen there, despite differencesattributableto chance at pretest,numberof in-
tents declines over time, with the most sizable decline in the experimentalgroup
on the main task.An ANOVA yielded significanteffects for the maintask forboth
time, F(1, 41) = 60.94, p < .001, and the Time x Conditioninteraction,F(1, 41) =
6.75, p = .013. For the transfertask, only the effect for time was significant,F(1,
41) = 5.92, p = .02. Also relevantare the numberof studentsfor whom the mean
numberof intentsdeclined to less than2, indicatingthat at least some of the time
this studenthad the intent of investigatinga single feature.At the posttest, these
of Participants
byPatternsof ValidInferences
Group No ValidInferences Some ValidInferences All ValidInferences
Pretest 16 5 0
Posttest 8 7 6
Pretest 14 7 0
Postest 12 4 5
MeanNumberof Intentsper InstanceExamined
Group Pretest Posttest
Main task
M 3.74 2.00
SD 0.87 0.72
M 3.05 2.24
SD 0.89 1.08
Total groupb
M 3.40 2.12
SD 0.94 0.91
M 3.10 2.45
SD 1.5 1.41
M 2.57 2.02
SD 1.12 1.16
Total groupb
M 2.83 2.24
SD 1.33 1.29
aN= 21; bN= 42.
frequencies were 15 (71% of participants)for the experimentalgroup and 12
(57%) for the controlgroupon the main task. This differencewas not maintained
in the transfertask,however.Frequencieswere 11 (52%)and 14 (67%)for experi-
mental and controlgroups,respectively.
Relationof understandingto strategies. Qualitativeanalysisof patterns
of performanceindicatedthatfocus on a single featureat a time as an investigatory
intentat the posttestwas associatedwith betterstrategiesat the performancelevel.
Of 10 participants(6 experimentaland 4 control), who showed consistent sin-
gle-featureinvestigatoryintentat the posttest, all displayedvalid inferenceat the
posttest.Of the 6 experimentaland 12 controlparticipantswho displayedno valid
inference, conversely, none displayed single-featureinvestigatoryintent. These
studentseitherintendedto investigatemultiplefeaturesat once, shiftedtheirintent
fromone featureto anotherbeforethe necessaryevidence hadbeen generatedwith
respectto the firstfeature,orexpressedno investigatoryintent("didn'tknow"what
they were going to find out).
Direct assessment of understanding. The metalevelassessmentmeasure
was designedto providea directmeasureof whatparticipantsunderstoodat the fi-
nal assessmentwith respectto (a) the objective of the task and (b) why controlled
comparisonwas the best strategyfor achievingthat objective.Both of these were
assessed in a contentdomainotherthanthe one in which studentshadhadexercise.
Studentswho scored at the highest level (Level 3) chose Pat's plan (which al-
lows unconfoundedcomparison)as the betterone and were able to answerboth
questions about Pat's plan correctly-why it is better than Lee's plan
(metastrategicunderstanding)and what Pat is intendingto find out (metataskun-
derstanding).Typical of the correctanswersto the firstquestionwere "becausehe
only changedone thing"or "becauseeverythingis the same except age," although
a few studentsshowed very clear metastrategicunderstandingreflected in an an-
swer such as "If you change only one and it makes a difference then you know
whatmadethe change."Typicalof the correctanswersto the second questionwere
"if age makes a difference"or "if an older or youngerteacheraide is better."
Studentscategorizedas Level 2 chose Pat's plan as the betterone but answered
only one of the questionscorrectly,responding"I don't know"to the otheror giv-
ing a vague answer(e.g., "She'll find out if her plan is betterthanLee's").
Studentscategorizedas Level 1 chose Pat's plan as the betterone but offeredno
relevantjustification(e.g., "Pat'splan is betterbecause being a parentmeans she
knows how to take care of her students").
Table 7 shows the number of students in each group categorized at each
level. All studentsin the experimentalgroup, it is seen, recognized Pat's plan as
better,and all but 2 studentsin the controlgroup did so. The numberof students
who were able to justify the superiorityof Pat's plan in meeting the task objec-
tives, however, is significantly higher among students in the experimental
group-55% versus 38%, X2(1,N= 36) = 7.60, p < .01. These results suggest
that (a) overall, students' implicit understanding(reflected in the correctchoice
of Pat's plan) outstripstheir explicit understanding(reflected in their justifica-
tions of the choice); and (b) the experimentalcondition facilitates the develop-
ment of metalevel understanding.
Results also indicatethatmetastrategicunderstandingmay remainincomplete
even among studentswho show considerableunderstandingby correctlyanswer-
ing the two questions described.In responseto the question"Whatwill Lee find
of Studentsat EachLevelof Performance AssessmentMeasure
on the Metalevel
Group Level 3 Level 2 Level 1 Level 0
Experimentalgroup 11 1 8 0
Controlgroup 6 3 5 2
out with Lee's plan?"only 4 studentsin the experimentalgroupand 5 studentsin
the controlgroupansweredcorrectly,typicallyby identifyingthe limitationof the
noncontrolledcomparisonstrategy(e.g., "Shewon't find out anythingbecauseshe
won't know what causedthe change").The less commonanswer"He'll find out if
anythingmatters"was also counted as correct.Others,when asked about Lee's
plan, not only did not acknowledge its inferiority(e.g., "She will find out the
same")but also indicatedpotentiallyproductiveoutcomesof the plan (e.g., "She'll
find out if the totallyoppositepersonwill make a difference").The latterresponse,
we would claim, invokes the faulty co-occurrencemental model of analysis via
featurelevels, ratherthan features.
Two measuresof the posttestknowledgethatstudentsexhibitedaboutthe system
following theirinvestigationswere examined.Onewas the totalnumberof features
they implicatedas causalin interpretingoutcomes.Theotherwas the correctnessof
theirconclusionsas to which ofthe featureswere causalandwhichwerenoncausal.
At the pretestfor the main task, studentsimplicateda mean of 2.69 featuresas
having causalstatus(comparedto the correctnumberof 3). Following theirinves-
tigationswith the flood program,the meannumberof featuresimplicateddeclined
to 2.22, a significantdecrease,F(1, 39) = 4.68,p = .037. (This decreasedid not dif-
fer significantlyacross conditions.)In this respect,then, studentsbecameless cor-
rect following investigation.
However,this conclusionmustbe temperedby the knowledgethatstudentsdis-
played as to which featureswere causalandwhich were noncausal.These findings
are examinedonly for the maintask. (Students'knowledgewould not be expected
to increaseappreciablyin the transfertask,given theirlimitedexposureto it.) With
respect to both noncausal features(waterpollution and elevation), there was in-
creasefrompretestto posttestin the numberof correctconclusions,indicatingim-
proved knowledge aboutthe causal system. Many students,however, maintained
their incorrectbeliefs that these featureshad causal status. For water pollution,
numberof studentsexhibitingcorrectconclusionsincreasedfrom 10 to 26 (of a to-
tal groupof42). Forelevation,the numberincreasedfrom 12 to 18. Withrespectto
the causal featurewater temperature,correctconclusions regardingthe direction
and natureof its causal statusincreasedfrom 8 at the pretestto 18 at the posttest.
(The remaindermost commonlyjudged the featurenoncausal,althougha few stu-
dentsjudged it causal but in the incorrectdirection,or chose an "it depends"op-
tion.) Similarly,correctconclusionsregardingthe soil type featureincreasedfrom
9 at the pretestto 19 at the posttest,with most of the remainingstudentsjudgingthe
featurenoncausal,but 1 studentcorrectlystipulatedan interactioneffect with soil
depth. Soil type was initially (and correctly)judged causal by the largestnumber
of students-23. This numberincreasedto 33 at the posttest, with a few students
nonetheless retainingincorrectbeliefs. Thus, students' interactionwith the pro-
gramover time enabledboth groupsto increasetheirknowledgeof the causal sys-
tem. The retentionof incorrectbeliefs, despitethe substantialamountof evidence
each participantgenerated,however,was commonand did not differ significantly
across conditions.
Increasingly,"authentic"scientific activity is being promotedas a model of good
science education (Bransfordet al., 1999; Cavalli-Sforza,Weiner, & Lesgold,
1994; Eisenhart,Finkel, & Marion, 1996; McGinn & Roth, 1999; Palincsar&
Magnusson,in press). Such activity is contrastedto the allegedly more superficial
observation,description,and laboratoryexercises with well-knownoutcomesthat
long have been the stapleof even the best science education.Studentsmustengage
in the genuineinquiry,it is argued-involving the formulationof questions,design
of investigations, and coordination of theory and evidence with respect to
multivariablesystems-that is characteristicof real science.
The datapresentedhere suggest thatthe skills requiredto engage effectively in
typical formsof inquirylearningcannotbe assumedto be in place by early adoles-
cence. If students are to investigate, analyze, and accurately represent a
multivariable system, they must be able to conceptualize multiple variables
additively coacting on an outcome. Ourresults indicatethat many young adoles-
cents find a model of multivariablecausality challenging. Correspondingly,the
strategiesthey exhibit for accessing, examining,and interpretingevidence perti-
nent to such a model arefarfromoptimal.We turnlaterto curriculumimplications
thatwe believe follow from these findingsand considerfirstwhat the resultssug-
gest regardingthe natureof these cognitive competenciesand how they develop.
What Develops?
Theperformanceskills (notablythe controlledcomparisonstrategy)thathave been
the focus of attentionin researchon scientificreasoningarguablyareonly one piece
of a complex structureof relatedskills thatundergoesdevelopment.This structure
needs to be definedboth horizontally(with respectto the componentsit includes)
and vertically (with respect to first its emergentand ultimately its consolidated
forms).An attemptto depictthe horizontalstructureappearsin Figure2, presented
earlier.Key componentsof this model are (a) the full cycle of inquiryactivity,be-
ginningwith the criticalskill of identifyingthe questionsto be askedandculminat-
ing in the advancementof claims in argumentivediscourse;(b) themetalevelofun-
23. 516 KAPLAN
derstanding (of both strategies, depicted on the left side of Figure 2, and
knowledge,depictedon the rightside) thatbothdirectsandis influencedby perfor-
mance, as discussed earlier;and (c) values associatedwith inquiryactivity, high-
lightedby ResnickandNelson-LeGall(1997) anddiscussedearlier.Relatedto val-
ues and also representedon the rightside of Figure2 is metalevel epistemological
understandingof the natureof one's own and other's knowledge and knowing
(Kuhn,Cheney, & Weinstock, in press). The broadimplicationto be drawnfrom
Figure2 is thatthereis moreto effectiveknowingthantheperformanceskills them-
selves (Kuhn,in press-b).
Verticalspecificationrefersto the fact thata complex structureof this sortdoes
not emergefully formedbut,morelikely, undergoesa gradualevolution.Research
with young elementaryschool children(Lehrer& Schauble,in press) has made it
clear that even very basic forms of organizingand representingdata (such as the
frequenciesof a set of possible outcomes)pose challengesto young children,and
the relevantunderstandingsand skills must be painstakinglyconstructed.In this
sense, the findinghighlightedin this work-that slightly older childrenhave diffi-
culty in representingrelationsbetweenmultipleantecedentvariablesandmultiple
outcomes-should not be surprising.At the otherend of the verticalcontinuum,it
is relevantto note thatin earlierresearch(Kuhnet al., 1995), adultcommunitycol-
lege studentswho were readily able to use the controlledcomparisonstrategyto
identify effects of individual featuresnonetheless often had trouble explaining
outcomes that were the additiveproductof two individualeffects and fluctuated
from one featureto the otherin accountingfor the outcome, seeing it as theirtask
to explain which single featurehad producedthe outcome. Recognizing their si-
multaneousadditiveinfluencewas a conceptualhurdlethatrivaledin difficultythe
conceptualhurdleposed by interactioneffects. Unrepresentedin the inquiryactiv-
ity in which studentsengaged in this work is the furtherconceptualchallengethat
is posed when outcomes are not deterministic(as they were in our activity) but
ratherare a probabilisticdistributionaroundsome centraltendency. Studentsof
any age will not be successful in understandinginteractiveinfluenceson probabil-
istic outcomes until they have masteredthe more elementarymodel on which we
focus here, involving multiple effects additivelyacting on an outcome.
Mental Models
Mentalmodels of any sortremainessentiallyunobservabletheoreticalconstructs.
Performanceindicatorsof varioustypes serve as evidence thata particularmental
model is in operation,but no empiricaldatacan indicatewith certaintythe opera-
tion of a particularmentalmodel. In inquiryactivities,mentalmodels arethe indi-
vidual's representationof the (virtualor actual)reality that is being investigated.
Forthis reason,they arelikely to influencethe strategiesthatarebroughtto bearon
the task. Nonetheless, we cannotsay with certaintythat it was revision in mental
models thatbroughtaboutthe changesobservedover time in this work. Such revi-
sion couldbe an effect ratherthana cause.Nor wouldwe wantto claim thatthe kind
of interventionundertakenin this workrepresentsthe only soundapproachto facil-
itatingdevelopmentof the cognitive competencieswe have identifiedas involved
in inquirylearning.However,this interventionwas targetedatthe metalevelofcog-
nitiondepictedin Figure2, andwe do wantto claim thatthis level ofunderstanding
about inquiry, in contrast to the "understandinghow" emphasized in perfor-
mance-focusedinterventions,plays anessentialrole in effectingchange.Metalevel
understandingcan come aboutas a productofthe exercise ofperformanceskills, as
well as by directtargeting,but it cannotbe bypassed.
The importanceof this metalevel of understandingabout inquiry is also un-
derscoredby the fact that in most of the knowledge seeking that studentsmay
engage in outside of a formalschool setting, they are unlikely to have the oppor-
tunity to devise and execute controlledexperiments.Much more often, they will
be in a position of interpretingevidence derived from partiallycontrolledor nat-
ural experiment data (Kuhn & Brannock, 1977). It is all the more important,
then, that their interpretationsnot be compromisedby an inadequatemental rep-
resentation of the multivariablecausality that such data are likely to reflect.
Equally critical is metalevel understandingof the strengthsand weaknesses of
inference strategiesthat may be effective, effective but inefficient, or ineffective
and fallacious. Again, what to do (when controlledexperimentationis possible)
is only one piece of a larger knowledge structurethat includes what not to do
and why, as well as what to conclude when controlledexperimentis not feasi-
ble-to know when we do not know, when we have a way to find out, and when
we will never know (Kuhn, in press-b).
Patterns and Mechanisms of Change
The resultspresentedhere confirmearlierresearch(Kuhnet al., 1995;Kuhnet al.,
1992; Schauble, 1990, 1996) indicatingthatexercise can be a sufficientcondition
to induce strategicchange,both in increasingthe frequencyof effective strategies
anddecreasingthe frequencyof ineffective ones. This workextendsthese findings
to metalevelunderstandingof task and strategiesandthe mentalmodels of causal-
ity associated with them. In addition to performance,metalevel understanding
(measuredboth directlyandindirectly,the lattervia investigatoryintent)increases
with exercise. This change at duallevels supportsthe kind of continuousfeedback
model depicted in Figure 2.
An additionalfinding of this work is that exercise directlyat the metalevel (in
the experimentalcondition)furtherenhanceschange.These benefits (indicatedby
significant effects of condition)were seen either specifically at the metalevel (in
both direct and indirectmeasures)or in the transferto a new task at the perfor-
mance level. (Conditiondifferences,recall, did not reach significance at the per-
formancelevel for the maintask,thoughthey were in the expecteddirection.)The
social componentof the exercise at bothperformancelevel andmetalevel (in both
cases, studentsworkedin pairs),it shouldbe noted, in itself providesa weak form
of metalevel exercise for students in both conditions. If partnersare suitably
matched,studentsshow higherlevels of performancewhen workingwith a partner
thanthey do when workingalone on the same task (Kuhn,in press-c). The exter-
nalizationof metalevel decisions in social dialoguepresumablysupportsthis nor-
mally covert level of processing. We did not make this comparison (between
social and solitaryconditions)in this study,however,because we wished to iden-
tify the effect of directmetalevel exercise.
More specific than this generalmodel of dual-level change are the particular
metalevelunderstandingsandperformance-levelstrategiesthatwere the object of
the presentresearch.Althoughunderstandingof task objectives is criticalto per-
formanceof most cognitive tasks (Kuhn& Pearsall, 1998; Schaubleet al., 1991;
Siegler & Crowley, 1994), in this case we have arguedspecifically thatmetalevel
understandingof the task objectiveof identifyingthe effects of individualfeatures
(a) requiresa correctmentalmodel of multivariablecausalityand(b) is a prerequi-
site for consistent choice of the controlled comparisonstrategy.Logically, the
value andpowerof the controlledcomparisonstrategycannotbe appreciatedin the
absenceof this mentalmodel. Empirically,ourdatasupportthis claim. Progressin
understandingthe task objectiveas one of the identifyingeffects of each of the in-
dividual features (which we took as an index of an accuratemental model of
multivariablecausality)showed significanteffects of both time and experimental
conditionand, in analyses of individualpatterns,was associatedwith good strat-
egy usage. An implicationfor researchon scientificreasoningis thatinvestigatory
intent is at least as importantas the controlledcomparisonstrategyas a topic of
In examining individualstudents'patternsof performance,we found mixture
(of levels) and gradualchange to be the rule ratherthanthe exception, consistent
with the findings of microgenetic research (Kuhn, 1995; Siegler & Crowley,
1991). Because studentsworkedwith a changingset of partnerswho produceda
collaborativeperformance,these resultsdo not allow microgeneticanalysis of in-
dividualchangepatterns.Also, it is not obvious exactly what the parallelof strat-
egy mixture might be in the domain of mental models. Studentsmay display a
confused or incoherentmodel in the course of transitionfrom a less correctto a
morecorrectmodel or, as they do in the case of strategies,they may rely on one ap-
proach(model) at one time anda differentone at another.Ourdatado not allow us
to choose definitively between these two alternatives,but they do suggest that a
shift in mentalmodels, like strategyshifts, is not abruptand total, but more likely
takes place slowly and in gradualsteps.
26. LEARNING 519
The Process-Content Debate
The inquiry activity that students in this study engaged in was deliberatelyde-
signed as "content-lean,"in the sense that we were not undertakingto teach stu-
dents any significantbody of scientific knowledge. Instead,our approachwas to
examine in as simple a context as possible the strategies, metastrategicunder-
standing, and attendantgeneric mental models requiredfor productive inquiry
regardingrelations among variables. If faulty strategiesand mental models are
observed in this context, it is likely that they will be present as well in a more
complex, content-richenvironment(though they will be harderto identify and
examine in that context).
A contrastingpoint of view is thata more content-richcontext would have fa-
cilitatedthe reasoningobservedin this study. In otherdomainsof inferentialrea-
soning-for example,Wason's (1983) four-cardproblem-performance has been
shown to improvedramaticallywhen the problemis situatedin a familiarcontext.
There is an importantdifference,however, between that reasoningparadigmand
the one investigatedhere. In the former,the objective in providinga familiarcon-
text is to facilitatereasoners'recognitionand,hence, applicationof a formof infer-
ence they alreadyknow well (e.g., permissionand obligation).
This situation, in contrast,is a bit more complex because we are looking to
do more than invoke a well-establishedreasoningscheme. The broad-levelpro-
cess skill in question,the coordinationof theorywith new evidence, can proceed
in several differentways. If new evidence is entirelycompatiblewith an existing
theory, the evidence may readilybe integratedinto it and become partof its rep-
resentation.However, this does not guaranteethatthis new evidence will be rep-
resented independently of the theory and brought to bear on it, which we
identify as a hallmarkof matureor skilled scientific thinking(Kuhn, in press-a;
Kuhn & Pearsall,2000). Instead,evidence may be integratedas an "illustration"
of what is already accepted as true, or it may simply be assimilated without
The more interestingcase, because it allows a clearerassessmentof scientific
thinkingas a process, is one in which evidence conflicts with theoryand,hence, is
not readilyassimilable,forcingthe individualto ignore,dismiss, or distortit or, al-
ternatively,to representit accuratelyand evaluateits bearingon the theory.In the
case in which the theoreticalrepresentationis richly elaboratedand highly famil-
iar,it is not clearthatscientificthinking(again,as a process skill, in contrastto sci-
entific understanding or knowledge) will be enhanced. Available evidence
comparingscientific reasoning strategies across more and less familiar content
suggests thatcontextuallyrich, highly elaborated,and highly familiarcontent,es-
pecially to the extentthatit invokes entrenchedbeliefs, is motivatingas a topic for
contemplationbut can resist the impingementof new evidence and, hence, work
againstproficientscientific thinking(Kuhnet al., 1995).
27. 520 KAPLAN
forScience Education
Animplication thatshouldnotbedrawnfromthisresearchis thatinquiryactivityis
inappropriate in theelementary ormiddleschoolsciencecurriculum becausestu-
dentsdonothavetherequisiteskillsto engagein it productively. Themessagewe
hopeourworkwill conveyis a differentone,whichis thatsupporting thedesignof
inquiry curriculum forthesecritical
yearsin science educationshould be identifica-
tionof a sequenceof well-delineated cognitivecompetencies thatbecome theob-
jectiveof thiscurriculum.In theabsence of an explicitsequence of thisnature,in-
quirylearning risks becoming a vacuous practice-one embraced without clear
evidenceof thecognitiveprocessesoroutcomesthatit is likelyto foster.
Webelievethisstudymakesa contribution in thisrespect,butsuchaneffortis
far fromcomplete.The skills andunderstanding we have highlightedhere lie
somewhere inthemiddleof anextendeddevelopmental Thekindsof el-
ementary skillsinposingquestionsandrepresenting datathatLehrerandSchauble
(inpress)havestudiedformtheinitiallevelsof thishierarchy andareits essential
foundation. At itsupperlevelsaretheskillsandunderstanding neededto construct
data-based modelsof causalsystemsthatincludemultiplelayersof causalityand
multiplevariables(andvariablelevels)thatinteractively influenceprobabilistic
outcomes.Theseareskillsintegralto the scientificinquirythatoccursin profes-
sional science.
The interventionaspectof this researchsimilarlyleaves much still to be
learned.Froman educational perspective,themajorquestionis notexactlywhy
theintervention was effectivebutwhy it was notmoreeffective.At best,we can
speculateas to whatkindsof interventions mighthavebeenmoreeffectiveforthe
sizableminorityof studentswhoshowedlittleornoevidentbenefitfromtheexpe-
riencewe provided.Ourworkdoespointto (a) investigatory intent,(b) a mental
modelof multivariable causality,and(c) metalevelunderstanding as promising
targetsof futureinterventionefforts.However,moreanddifferentkindsof efforts
certainlyseemwarranted, especiallyinviewof theenormous currentinterestinin-
quiryas a teachingmethod.
A finalcommenthasto dowiththeconnectionbetweenscientificthinkingand
scienceeducation.A view emphasized in thiswork,andreflectedin Figure2, is
thatscientificthinkingencompasses a gooddealmorethanthecontrolledcompar-
isonstrategythathasbeenthe focusof attentionin mostdevelopmental research
on scientificthinking.A relatedviewhasbeenexpressedin recentwritingon sci-
ence educationthatemphasizesthe importance of formulating productiveques-
tions,representing observations in insight-generating ways, andadvancingand
debatingclaims in a frameworkof scientificargument(Lehrer,Carpenter,
Schauble,&Putz,2000).Mastering thecoordination of questions,datarepresenta-
tions,andargument, Lehreret al. claimed,"putsstudentsontheroadto becoming
authorsof scientificknowledge"(p. 97).
Despiteitscomprehensiveness in encompassing allphasesof scientificactivity
(frominquirythroughargument), the kindof inquiryactivityfeaturedin thisstudy
is by itselffarfroma modelof whata comprehensive sciencecurriculum should
be. Still,we do see suchanactivityasvaluableasonestrandinterwoven intoa rich
middleschoolsciencecurriculum. Itsvalueas aneducational tool,we believe,lies
in its focusingattentionon the formsof questionaskingandansweringthatare
centralto scientificthinking.By directingstudents'attentionto thethinkingthey
do in addressingscientificquestions,we not only implicitlyconveyvaluesand
standards of science("Howdoyouknow?"),butwe alsodevelopmetalevelaware-
nessand,ultimately,regulation andof infer-
of questions,of datarepresentations,
ences thatdo-and especiallythatdo not-follow fromwhatis observed.Of
course,we wantstudentsto acquirerich anddeepunderstanding of the world
aroundthemas a goalof theirscienceeducation, butawarenessandunderstanding
of their own and other'sthinkingaboutscientificquestionsseem important
enoughto warranta prominent placein thiscurriculum.
Bransford,J., Brown,A., &Cocking, R. (Eds.).(1999). Howpeople learn:Brain, mind,experience,and
school (Reportof the National ResearchCouncil). WashingtonDC: NationalAcademy Press.
Case, R. (1974). Structuresand strictures:Some functional limitations on the course of cognitive
growth. CognitivePsychology, 6, 544-573.
Cavalli-Sforza,V., Weiner,A., & Lesgold, A. (1994). Softwaresupportfor studentsengagingin scien-
tific activity and scientific controversy.Science Education, 78, 577-599.
Chan,C., Burtis,J., & Bereiter,C. (1997). Knowledge-buildingas a mediatorof conflict in conceptual
change. Cognitionand Instruction,15, 1-40.
Chen, Z., & Klahr,D. (1999). All otherthings being equal:Acquisitionand transferof the control of
variablesstrategy.ChildDevelopment,70, 1098-1120.
Crowley,K., & Siegler, R (1999). Explanationandgeneralizationin young children'sstrategylearning.
ChildDevelopment, 70, 304-316.
de Jong, T., & van Joolingen,W. R. (1998). Scientificdiscoverylearningwith computersimulationsof
conceptualdomains.Review of EducationalResearch, 68(2), 179-201.
DeLoache,J., Miller,K., & Pierroutsakos,S. (1998). Reasoningandproblemsolving. In W. Damon(Se-
ries Ed.) & D. Kuhn& R Siegler(Vol. Eds.),Handbookofchildpsychology: Vol2. Cognition,lan-
guage. andperception (5th ed., pp. 801-850). New York:Wiley.
Eisenhart,M., Finkel, E., & Marion,S. (1996). Creatingthe conditionsfor scientific literacy:A re-ex-
amination.AmericanEducationalResearchJournal, 33, 261-295.
Gentner,D., & Stevens,A. (Eds.).(1983).Mentalmodels.Hillsdale,NJ:LawrenceErlbaumAssociates,Inc.
Klahr, D. (2000). Exploring science: The cognition and developmentof discoveryprocesses. Cam-
bridge, MA: MIT Press.
Klahr,D., Fay, A. L., & Dunbar,K. (1993). Heuristicsfor scientific experimentation:A developmental
study. CognitivePsychology, 25, 111-146.
Kuhn,D. (1995). Microgeneticstudyofchange: Whathasit toldus?Psychological Science, 6, 133-139.
Kuhn,D. (in press-a).Whatis scientific thinkingandhow does it develop?In U. Goswami(Ed.),Hand-
book of childhoodcognitive development.Oxford,England-Blackwell.
29. 522 KAPLAN
Kuhn,D. (inpress-b).Howdopeopleknow?Psychological Science.
Kuhn,D. (inpress-c).Whydevelopment does(anddoesn't)occur:Evidencefromthedomainofinduc-
tivereasoning. InR.Siegler&J.McClelland (Eds.),Mechanisms ofcognitivedevelopment: Neural
andbehavioralperspectives. Mahwah, NJ:Lawrence Erlbaum Associates,Inc.
Kuhn,D.,Amsel,E.,&O'Loughlin, M.(1988).Thedevelopmentofscientific thinkingskills.NewYork:
Kuhn,D., & Angelev,J. (1976).An experimental studyof the development of formaloperational
thought.ChildDevelopment, 47, 697-706.
Kuhn,D.,&Brannock, J.(1977).Development oftheisolationofvariablesschemeinexperimental and
"natural experiment" contexts.Developmental Psychology,13, 9-14.
Kuhn,D.,Cheney,R.,&Weinstock, M.(inpress).Thedevelopment ofepistemological understanding.
Cognitive Development.
Kuhn,D., Garcia-Mila, M.,Zohar,A., & Andersen,C. (1995).Strategies of knowledgeacquisition.
Monographs of theSocietyforResearchin ChildDevelopment, 60(4,SerialNo. 245).
Kuhn,D.,&Ho,V.(1980).Self-directed activityandcognitivedevelopment. JournalofAppliedDevel-
opmentalPsychology, 1, 119-133.
Kuhn,D.,Ho,V., &Adams,C.(1979).Formalreasoning amongpre-andlateadolescents. ChildDevel-
opment,50, 1128-1135.
Kuhn,D.,&Pearsall, S.(1998).Relations betweenmetastrategic knowledge andstrategic performance.
Cognitive Development, 13, 227-247.
Kuhn,D., &Pearsall,S. (2000).Developmental originsof scientificthinkingJournalof Cognition and
Development, 1, 113-129.
Kuhn,D., &Phelps,E.(1982).Thedevelopment of problem-solving strategies.InH.Reese(Ed.),Ad-
vancesin childdevelopment andbehavior(Vol. 17,pp. 144). NewYork:Academic.
Kuhn,D., Schauble, L.,&Garcia-Mila, M.(1992).Cross-domain development of scientificreasoning.
CognitionandInstruction, 9, 285-327.
Lehrer,R.,Carpenter, S.,Schauble, L.,&Putz,A. (2000).Designingclassrooms thatsupport inquiry.In
J.Minstrell &E.vanZee(Eds.),Inquiring intoinquirylearningand teachinginscience(pp.80-99).
Washington, DC:American AssociationfortheAdvancement of Science.
Lehrer,R., & Schauble,L. (in press).Modelingin mathematics andscience.In R Glaser(Ed.),Ad-
vancesin instructionalpsychology (Vol.5). Mahwah, NJ:Lawrence Erlbaum Associates,Inc.
McGinn,M.,& Roth,W.(1999).Preparing studentsforcompetent scientificpractice: Implicationsof
recentresearchin scienceandtechnologystudies.Educational Researcher, 28, 14-24.
Palincsar,A., &Magnusson, S.(inpress).Theinterplay of first-handandsecond-hand to
modelandsupport thedevelopment of scientificknowledge andreasoning. InS. Carver& D.Klahr
(Eds.),Cognition andinstruction: Twenty-five yearsofprogress.Mahwah, NJ:Lawrence Erlbaum
Pearsall,S.(1999).Effectsofmetacognitive exerciseonthedevelopment ofscientificreasoning. Unpub-
lisheddoctoraldissertation, TeachersCollege,Columbia University, NewYork.
Resnick,L., & Nelson-LeGall, S. (1997).Socializingintelligence.In L. Smith,J. Dockrell,& P.
Tomlinson (Eds.),Piaget,Vygotsky, andbeyond(pp. 145-158).Boston:Routledge &KeganPaul.
Schauble, L.(1990).Beliefrevisioninchildren: Theroleof priorknowledge andstrategies forgenerat-
ingevidence.Journalof Experimental ChildPsychology,49, 31-57.
Schauble, L.(1996).Thedevelopment of scientificreasoning inknowledge-rich contexts.Developmen-
talPsychology,32, 102-119.
Schauble, L.,Klopfer,L.,& Raghavan, K.(1991).Students' transition fromanengineering modelto a
sciencemodelof experimentation. JournalofResearchin ScienceTeaching, 28, 859-882.
Siegler,R.,&Crowley,K.(1991).Themicrogenetic method: A directmeansforstudyingcognitivede-
velopment. American Psychologist, 46, 606-620.
Siegler,R., & Crowley,K. (1994).Constraints on learningin nonprivileged domains.Cognitive
chology,27, 194-226.
Sophian,C.(1997).Beyondcompetence: Thesignificance of performance forconceptual
Cognitive Development,12, 281-303.
Vosniadou, S.,&Brewer,W.(1992).Mentalmodelsoftheearth:A studyof conceptual changeinchild-
hood.Cognitive Psychology,24, 535-585.
Wason,P. (1983).Realismandrationality in theselectiontask.InJ. St. B. Evans(Ed.),Thinking
reasoning:Psychological approaches (pp.44-75). London:Routledge& KeganPaul.
Zimmerman, C. (2000).Thedevelopment of scientificreasoningskills.Developmental Review,20,