Trilling (2015), Four-dimensional Education: the Competencies Learners Need to Succeed., Center for Curriculum Redesign, Cambridge, MA. Note: Designed to be used as part of a longer assessment of problem solving, the PEEP task challenges students to build a viable ecosystem and place it in a natural environment where it can thrive. [34] Shute,V. and M.Ventura (2013), Stealth Assessment: Measuring and Supporting Learning in Video Games., in John,D. and C.MacArthur (eds. The seminal volume Knowing What Students Know (National Research Council, 2001[13]) brought cognitive theory into the assessment realm using a framework accessible to teachers and policy makers. Why game- or simulation-based assessment in education? (2015), Does agency matter? 82, pp. Unlike interim assessments which can be aggregated at various education levels and are related to broad summative goals, formative assessments are adjusted to individual needs and to immediate teaching strategy (Shepard, Penuel and Pellegrino, 2018[21]). Game-based or simulation-centred assessments collect a wealth of data that is often missed or unable to be captured by traditional tests sometimes stealthily or unbeknownst to the test-taker (Shute and Ventura, 2013[34]). (2019), Summative Game-based Assessment., in Ifenthaler,D. and Y.Kim (eds. (2017), Principled Approaches to Assessment Design, Development, and Implementation, in, [23] Marion,S. etal. An additional challenge to consider with GBAs is the need to make them accessible for students with disabilities. PEEP is designed to be eventually used in high-stakes, summative assessment and supports the creation of many parallel forms or versions to improve test security. Successful mapping of telemetry to measurement objectives requires a concentrated effort between designers, software engineers, and measurement scientists. (2018), Education 4.0 - Artificial Intelligence Assisted Higher Education: Early recognition System with Machine Learning to support Students Success. While this has led to the development of a range of e-learning applications to be used both inside and outside of the classroom (from virtual labs to medical e-learning tools with simulations), this technological advancement has also opened avenues for a new generation of standardised assessments. [1] Braun,H. and A.Kanjee (2006), Using assessment to improve education in developing nations., in Braun,H. etal. As Mislevy (2018[35]) notes, "the worst way to design a game- or simulation-based assessment is to design what seems to be a great game or simulation, collect some observations to obtain whatever information the performances provide, then hand the data off to a psychometrician or a data scientist to figure out how to score it. While there are benefits to post hoc exploratory analyses, they should not be the driving mechanism for how one scores the assessment. As part of this improvement initiative there has been a growing movement around and interest in new assessment technologies and approaches, including immersive, game- or simulation-based assessments (GBAs) (DiCerbo, 2014[3]; Shaffer etal., 2009[4]; Shute, 2011[5]). [43] Darrah,M. (2013), Computer Haptics: A New Way of Increasing Access and Understanding of Math and Science for Students Who are Blind and Visually Impaired. For example, while a traditional standardised test may be a valid, reliable, fair, and efficient way to measure algebra, it may not be a modality suitable for measuring constructs like creative thinking or collaborative problem solving. While traditional educational assessments are designed to generally meet standards of technical quality in areas like validity (does the assessment measure what it is supposed to measure? [31] Seelow,D. (2019), The Art of Assessment: Using Game Based Assessments to Disrupt, Innovate, Reform and Transform Testing.. To develop these, PEEP uses an algorithm to create viable ecosystem solutions of approximately equivalent difficulty based on a large library of organisms. (2019), The Expanded Evidence-Centered Design (e-ECD) for Learning and Assessment Systems: A Framework for Incorporating Learning Goals and Processes Within Assessment Design, Frontiers in Psychology, Vol. For examples, they are more costly and difficult to develop than traditional standardised tests based on a simple succession of discrete questions or small tasks. 21-34, http://dx.doi.org/10.1111/emip.12189. [4] Shaffer,D. etal. Part of the responsible development of GBA is to monitor these gaps and also to minimise differential item functioning (DIF defined as when items dont behave as expected for test-takers of the same ability but different backgrounds) for both the usual subgroups (gender, ethnicity, language status) but potentially new ones like gaming experience (see Box10.2). [38] Yang,F. etal. ), Game-based Assessment Revisited, Springer. 1/2, pp. 28/3, pp. [36] Gobert,J., R.Baker and M.Wixon (2015), Operationalizing and Detecting Disengagement Within Online Science Microworlds, Educational Psychologist, Vol. This is an especially relevant critique in education for two reasons. [9] Duncan,R. and C.Hmelo-Silver (2009), Learning progressions: Aligning curriculum, instruction, and assessment, Journal of Research in Science Teaching, Vol. [12] Nichols,S. and H.Dawson (2012), Assessment as a Context for Student Engagement, in Handbook of Research on Student Engagement, Springer US, Boston, MA, http://dx.doi.org/10.1007/978-1-4614-2018-7_22. 333-353, http://dx.doi.org/10.1007/bf02295640. your login credentials do not authorize you to access this content in the selected format. 378-392, http://dx.doi.org/10.1016/j.compedu.2014.12.011. This includes storyboarding out the measures of interest, determining the evidence needed to capture them, and the exact quantification of that evidence. 33-53, http://dx.doi.org/10.1162/ijlm.2009.0013. (2015), Does agency matter? Therefore, the item design process should take place nearer to the beginning of the entire project, as designing a GBA takes a significant amount of forethought and discipline and mistakes can be very costly. ), Improving Education Through Assessment, Innovation, and Evaluation., Cambridge, Mass. For example, psychometricians have suggested new measurement models reflecting task complexity (Mislevy etal., 2000[44]; Bradshaw, 2016[45]; de la Torre and Douglas, 2004[46]). Summative tests are given at the end of instruction to evaluate what has been learned. 715-730, http://dx.doi.org/10.1037/0022-0663.88.4.715. Success requires an interdisciplinary team with a broad range of skills, including game designers, software engineers ideally with a background in game, and cognitive scientists, as well as the test designers, content experts, educational researchers, and psychometricians usually needed to develop an assessment. This includes the quantification of evidence and scales that will be used. By having the students actually collaborate in a cooperative game, Crisis in Space delivers an authentic and engaging experience and improves upon earlier attempts to measure collaboration via student-agent (chatbot) interaction. In other words, the AI can be used to play all of the proposed variations of the GBA as means of increasing the likelihood that they are all comparable in difficulty before moving to expensive and time-consuming pilot testing with human test-takers. [7] Shute,V. etal. However, game-based approaches often do not produce as many useable item scores as we might hope given their relatively high development cost when compared to more traditional, discrete items. This chapter discusses how recent advancements in digital technology could lead to a new generation of game-based standardised assessments in education, providing education systems with assessments that can test more complex skills than traditional standardised tests can. In addition to requiring a broader range of technical expertise, GBAs can also require innovation in technologies or statistical approaches to measurement. Rapid technological developments such as virtual/augmented reality, digital user interface and experience design, machine learning/artificial intelligence, and educational data mining have led to the improvement of simulated digital environments, and accelerated progress in the quality and design of digital simulations and video games. [24] Verger,A., L.Parcerisa and C.Fontdevila (2019), The growth and spread of large-scale assessments and test-based accountabilities: a political sociology of global education reforms, Educational Review, Vol. [52] Chopade,P. etal. 706-732, http://dx.doi.org/10.3102/1076998618784700. Not only should the designers conduct traditional empirical psychometric analyses necessary to create valid and reliable assessments, they should also take advantage of the wealth of additional data generated by GBA to apply novel methods from domains like machine learning to extract more useable information about test-takers ability or other constructs where possible e.g. Given that there was no gender difference following the use of the multiple-mice version of the game, this suggests that the choice of platform can create a gender gap in learning that is unrelated to the game. The use of games or simulations is a very promising way to assess these complex constructs either as part of a revised curricular framework or as a novel addition to the content covered by the usual standardised tests (Stecher and Hamilton, 2014[30]; Seelow, 2019[31]). 17-28. (2012), Design and discovery in educational assessment: Evidence-centered design, psychometrics, and educational data mining.. 43-57, http://dx.doi.org/10.1080/00461520.2014.999919. Los Angeles: University of California, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).. [6] Mislevy,R. etal. Note: Crisis in Space is a pilot game-based-assessment under development by ACT, Inc. as part of an ongoing program of research and development in collaborative problem-solving assessment by their research arm, ACTNext. (2019), Game Design for Eliciting Distinguishable Behavior., [39] Sabourin,J. etal. While some of this capacity can be contracted out to private-sector vendors, successful implementation will require public capabilities as well. (eds.). [22] Ferrara,S. etal. [16] Arieli-Attali,M. etal. It will be launched (and legally recognised for exams in Germany) in 2022. [21] Shepard,L., W.Penuel and J.Pellegrino (2018), Using Learning and Motivation Theories to Coherently Link Formative Assessment, Grading Practices, and Large-Scale Assessment, [22] Ferrara,S. etal. 37/1, pp. [37] Deterding,S. etal. 44/6, pp. In the version of the game implemented on a platform with multiple computer mice, students play in groups of three with each student controlling one mouse. 15/3, pp. 71/1, pp. [41] Bergner,Y. and A.von Davier (2018), Process Data in NAEP: Past, Present, and Future. 79/4, pp. Before the assessment design team starts to develop the game specifications, they must first outline what they intend to measure and how this will be accomplished. Or consider purchasing the publication. As an interim measure, the student can be assessed under more standardised simulation conditions to gauge progress toward summative goals. 295-321. [8] Sanders,W. and S.Horn (1995), Educational Assessment Reassessed, education policy analysis archives, Vol. [1] Braun,H. and A.Kanjee (2006), Using assessment to improve education in developing nations., in Braun,H. etal. Formatively, the GBA could provide continuous feedback and personalised suggestions in the course of instruction. The results from the experiment revealed statistically significant differences in performance between boys and girls after they used the augmented reality platform. In the augmented reality version, students can perform the same actions using a tablet.
[37] Deterding,S. etal. Interpretation of streaming data from gameplay or interaction with a carefully-designed digital user interface allows researchers to evaluate how people go about solving problems and can lead to more targeted feedback (Chung, 2014[15]). [50] Games for change (n.d.), Games for change, http://www.gamesforchange.org/game/simcityedu-pollution-challenge/ (accessed on 30April2021). [21] Shepard,L., W.Penuel and J.Pellegrino (2018), Using Learning and Motivation Theories to Coherently Link Formative Assessment, Grading Practices, and Large-Scale Assessment, Educational Measurement: Issues and Practice, Vol. Accordingly, there is growing interest in rationalising this confusing and fractured system. 116/11. Beyond psychometric innovation, game- and simulation-based assessment also poses new opportunities for technical innovation based on recent developments in machine learning and artificial intelligence (Ciolacu etal., 2018[48]). creativity, collaboration or socioemotional skills), as well as better measurement of some aspects of the thinking of respondents, including in traditional domains like science and mathematics. Examples of summative assessment applications in education include annual large-scale accountability tests and college entrance exams, but also drop from the sky monitoring tests like PISA, TIMSS, and various national assessments (Oranje etal., 2019[20]). This includes patterns of choices, search behaviours, time-on-task behaviours, and, in some cases, eye movement or other biometric information. 45-49, http://dx.doi.org/10.1177/016264340001500307. First, modern curricular frameworks around the world increasingly are multidimensional, including cross-cutting skills as well as more traditional academic content. (2018), Education 4.0 - Artificial Intelligence Assisted Higher Education: Early recognition System with Machine Learning to support Students Success, 2018 IEEE 24th International Symposium for Design and Technology in Electronic Packaging (SIITME), http://dx.doi.org/10.1109/siitme.2018.8599203. (eds. While games and simulations in assessment have been most often targeted at the formative level, recent advances in development and scoring have made their use in large-scale summative tests in national accountability systems and international comparisons more feasible (Verger, Parcerisa and Fontdevila, 2019[24]; Klieme, 2020[25]). [9] Duncan,R. and C.Hmelo-Silver (2009), Learning progressions: Aligning curriculum, instruction, and assessment. [14] Darling-Hammond,L. etal. [16] Arieli-Attali,M. etal. [17] Martone,A. and S.Sireci (2009), Evaluating Alignment Between Curriculum, Assessment, and Instruction. PEEP can also be delivered as a stage adaptive assessment task where test-takers are presented with a series of problems to solve whose difficulty varies algorithmically depending on prior performance. (2011), When Off-Task is On-Task: The Affective Role of Off-Task Behavior in Narrative-Centered Learning Environments, in. For example, while adapting an existing game for use as an assessment may, at first glance, appear to generate a large amount of data for each test-taker, it is often the case that such data may yield items or measurement opportunities that are poorly-aligned to the desired content domain, exhibit high intercorrelation (rendering many of them useless), or are at the wrong level of difficulty (i.e. Players assume the role of astronauts sent on a mission to bring back a precious crystal. They provide measurement models appropriate to the new data streams generated by games and simulations. [48] Ciolacu,M. etal. the data collected during the assessment game/simulation process). [15] Chung,G. (2014), Toward the Relational Management of Educational Measurement Data., Teachers College Record, Vol. (2017), Principled Approaches to Assessment Design, Development, and Implementation, in The Handbook of Cognition and Assessment, John Wiley & Sons, Inc., Hoboken, NJ, USA, http://dx.doi.org/10.1002/9781118956588.ch3. There are many ways to incorporate games and game-based features into a system or assessment that have varying impact on the learner. In addition to digital training units using videos and simulations, the project is developing assessments that will be used as exams to certify apprentices skills. While promising, this new generation of assessments brings its own challenges. The advantages of GBA, including the ability to assess historically hard-to-measure cognitive processes, better alignment with modern curricula, and increased student engagement in the measurement process, make it an important part of the future of all educational assessment systems. [8] Sanders,W. and S.Horn (1995), Educational Assessment Reassessed. The assessment designer must determine, a priori, exactly what the game is attempting to measure and how each game-based element provides evidence that allows such measurement. [44] Mislevy,R. etal. The use of this work, whether digital or print, is governed by the Terms and Conditions to be found at http://www.oecd.org/termsandconditions. [47] Echeverra,A. etal. What is the long-term promise of this approach and what is necessary to get us there? [27] Trilling,B. and C.Fadel (2009), 21st century skills: Learning for Life in Our Times., Jossey-Bass. (2010), Using balanced assessment systems to improve student learning and school capacity: An introduction., Council of Chief State School Officers, Washington, DC. This document, as well as any data and map included herein, are without prejudice tothe status of or sovereignty over any territory, to the delimitation of international frontiers and boundaries and to the name of any territory, city or area. 3, p.6, http://dx.doi.org/10.14507/epaa.v3n6.1995. In education, traditional standardised assessments have long been dominated by a model centred on collections of discrete questions (or items) designed to cover content in an assessment framework by addressing parts of the domain to be measured (Mislevy etal., 2012[6]). (2019), Game Design for Eliciting Distinguishable Behavior., Paper prepared for the 33rd Conference on Neural Information Processing Systems., https://papers.nips.cc/paper/8716-game-design-for-eliciting-distinguishable-behavior.pdf (accessed on 2January2020). Some examples of game-based assessment in education. (2019), A Tricky Balance: The Challenges and Opportunities of Balanced Systems of Assessment., in Paper Presented at the Annual Meeting of the National Council on Measurement in Education Toronto, Ontario April 6, 2019., National Center for the Improvement of Educational Assessment, https://www.nciea.org/sites/default/files/inline-files/Marion%20et%20al_A%20Tricky%20Balance_031319.pdf (accessed on 2January2020). ), and fairness (is the assessment culturally sensitive, accessible, and free of bias against any groups of test-takers? GBAs, on the other hand, aim to blur the line between traditional assessment and more engaging learning activities through the use of games and simulations designed to measure constructs in an environment that maximises flow and rewards students for demonstrating their cognitive processes in more engaging and authentic situations, not just their ability to memorise key facts (Shute etal., 2009[7]). It goes without saying that most test-takers do not enjoy the traditional assessment experience (Nichols and Dawson, 2012[12]; Madaus and Russell, 2010[26]). Moreover the use of GBA should not be limited to summative assessment alone but should instead be part of a coherent system of assessment throughout the academic year. Select one or more items in both lists to browse for the relevant content, Browse the selectedThemes and / or countries. (2016), Challenging games help students learn: An empirical study on engagement, flow and immersion in game-based learning, Computers in Human Behavior, Vol. SimCityEDU: Pollution Challenge was a GBA released in 2014 by GlassLab, a collaborative development initiative funded by the John D. and Catherine T. MacArthur and Bill and Melinda Gates Foundations. Committee on Defining Deeper Learning and 21st Century Skills., National Academies Press, Washington, D.C., http://dx.doi.org/10.17226/13398. [39] Sabourin,J. etal. [42] Rose,D. (2000), Universal Design for Learning, Journal of Special Education Technology, Vol. (2019), A Tricky Balance: The Challenges and Opportunities of Balanced Systems of Assessment., in. [15] Chung,G. (2014), Toward the Relational Management of Educational Measurement Data.. Using the webcam at the top of the screen, the system determines the location of each students astronaut by detecting the relative position of each student to the paper markers.
We draw an important distinction here between designing games or simulations explicitly for measurement purposes and gamification or the addition of game-like elements to existing tasks or activities to increase engagement, flow, or motivation (Deterding etal., 2011[37]). Thus, building a GBA requires forethought about the exact types of features and their potential impact on the learner and data collection (Shute and Ventura, 2013[34]). (2009), Epistemic Network Analysis: A Prototype for 21st-Century Assessment of Learning, International Journal of Learning and Media, Vol. Since girls seem to struggle more to use the augmented reality platform, it is possible that using the technology for GBA would put them at a disadvantage. We now turn to a closer examination of the features of GBA and a brief discussion of how to design game-based tests. 4/1, pp. [17] Martone,A. and S.Sireci (2009), Evaluating Alignment Between Curriculum, Assessment, and Instruction, Review of Educational Research, Vol. Such game-based assessments allow for the assessment of a broader range of skills (e.g. Simply put, game-based assessments might not be as fun as real games. 5-13, http://dx.doi.org/10.1111/j.1745-3992.2009.00149.x. (2019), Fostering Students Creativity and Critical Thinking:What it Means in School, Educational Research and Innovation, OECD Publishing, Paris, https://dx.doi.org/10.1787/62212c37-en. In order to realise the promise of game- and simulation-based assessment at the national level, education ministries need to invest in the infrastructure needed to design, implement, and operationally deliver such tests. too easy or too hard for the target population). ), Foundation Reports on Digital Media and Learning., The MIT Press, Cambride, MA, http://dx.doi.org/10.7551/mitpress/9589.001.0001. A good design principle for such an assessment system would be to use relatively inexpensive, traditional assessments where feasible (e.g. Here, the classroom blends with the game world: each desk is covered with a set of markers that allow the augmented reality system to place virtual objects over the desks. [46] de la Torre,J. and J.Douglas (2004), Higher-order latent trait models for cognitive diagnosis, Psychometrika, Vol. The education version has been adapted to reflect more accurate life sciences content as well as made developmentally appropriate for students. [49] Mislevy,R. etal. [38] Yang,F. etal. [3] DiCerbo,K. (2014), Game-Based Assessment of Persistence., Educational Technology & Society, Vol. [29] Fadel,C., M.Bialik and B. Game-based assessment in education also brings new fairness and equity concerns. [26] Madaus,G. and M.Russell (2010), Paradoxes of High-Stakes Testing. 5-30, http://dx.doi.org/10.1080/00131911.2019.1522045.
For example, while the benefits of GBA have led several operational assessment programs, such as PISA and the U.S. National Assessment of Educational Progress (NAEP), to add game or simulation components, due to cost they have done so in a limited fashion as part of a hybrid approach combined with more traditional item types and assessment strategies (Bergner and von Davier, 2018[41]).