Beyond Tests and Grades: How Stealth Assessments Are Changing Education
Levearging digital environments to create invisble assessments for "hard-to-measure" competencies is helping to revolutionize how we assess and support learning
In a world where access to connected devices and by extension learning content is becoming ubiqutous, and advances in AI are bringing down the cost of personalized content creation (including tutors) very close to zero, there are two key factors in education that provide the biggest leverage.
First, if you can learn anything you want or need to learn, then the key ingredient is motivation. In many ways, motivation is all you need.
Second, in order for us to be able to leverage that scarce motivation, we need to make sure you are on the most suitable learning path for you. The only way to do that is to assess your level of knoweldge in a certain domain.
To use a Google Maps analogy, we are still waiting for the “GPS Moment” in education where we can almost seemlessly know where you are on the journey and create the optimal path for you.
A very promising development in this search for a GPS moment are “stealth assessments” that can help students without explicity testing them.
To learn more about this promising development, I sat down with Prof. Seyedahmad Rahimi at the University of Florida, who has been working on and expanding this new way of learning assessment.
(The interview has been edited for clarity)
How did you get into learning and assessment?
I first encountered stealth assessments when I came to the U.S. for my Ph.D. Initially, I knew nothing about assessments and wanted to pursue a completely different path building on my graduate work in Malaysia in e-learning.
After taking a course with Valerie Shute it changed everything for me. It opened my eyes to the critical role assessments can play in a formative way, especially when it comes to hard-to-measure constructs like creativity, which are difficult to assess but vital to student development.
I started working with Val as a volunteer on various projects to explore how stealth assessment could be applied. This culminated in working with her when she received funding for two major projects—one for developing a new stealth assessment of physics understanding and learning support in Physics Playground, and another for creating “affective support” in the game.
The goal of the second project was to see if affective support could keep students engaged in gameplay and as a result enhance their learning end game enjoyment. I quickly became a central part of the team as the sole programmer of the game and learned a tremendous amount from her.
After I graduated, I joined the University of Florida and when Val retired, she passed the baton of stealth assessment to me.
My focus now is to make stealth assessment more accessible and understandable for a larger audience. In our academic papers, we often don't dive into the details, but I’m working on elaborating the steps needed to design and implement stealth assessments.
For those unfamiliar with the concept, could you explain stealth assessment in simpler terms? What makes it different from traditional assessment methods?
We typically think of assessments in two categories: formative, which supports learning, and summative, which measures what has been learned. Stealth assessment typically falls under formative assessment - although it could be adapted for summative assessment.
It’s called stealth because learners aren’t presented with explicit questions. Instead, they are immersed in performing tasks and progress naturally in a digital learning environment (e.g., digital games).
What makes stealth assessment unique is that the environment is carefully designed with assessment points built in from the beginning. It’s not about just collecting clickstream data; the system is designed to interpret learners' behaviors to gauge their competencies.
Stealth assessment requires a digital learning environment and follows a psychometric framework—specifically, we use the Evidence-Centered Design (ECD) framework. It also provides scaffolding to help learners build up their skills, which adds complexity when it comes to assessing progress formatively.
Finally, a key facet of stealth assessment is taking a “glassbox” approach. This glassbox approach ensures that we not only get accurate results but can also explain the process behind them, making it a much more reliable and understandable form of assessment.
Does Stealth Assessment allow for better transfer of learning?
Indeed, one of the advantages of stealth assessment, especially in environments like games or simulations, is its ecological validity—how it allows you to develop tasks that closely mirror real-life situations.
If the environment has strong ecological validity, then the claims we make are more likely to hold true. That said, we have to be cautious with how big those claims are. For a dramatic example, just because someone excels at Physics Playground doesn’t mean they’re on track to win a Nobel Prize in physics!
A very interesting area where you’ve used stealth assessment is to measure creativity. Can you tell us more about your work in this area?
In assessing creativity, we use a combined approach that incorporates both "top-down" methods like stealth assessment and "bottom-up" techniques such as machine learning.
The top-down approach begins by defining the construct we want to measure—creativity, in this case—then identifying the evidence needed to assess it.
From there, we design tasks that produce this evidence and collect data from specific features within the tasks. This model, which follows the Competency, Evidence, and Task framework, focuses on what the student creates, or the “student product,” in response to the task.
Stealth assessment involves two key processes: evidence accumulation and evidence processing. The challenge is to sift through the data and identify what’s meaningful. This is where machine learning, including natural language processing (NLP), can be especially useful in filtering the most relevant data.
Critically, we spend a significant amount of time discussing what constitutes the best evidence for assessing constructs like creativity. Generative AI has the potential to enhance this process by supplementing our expertise.
As I mentioned before, stealth assessment is grounded in the Evidence-Centered Design (ECD) framework, which uses highly complex competency models. These models don’t just offer a single score; they break down competencies into sub-scores. For example, a student might score low in creativity due to a lack of flexibility, even though they excel in originality.
I came to stealth assessments through Physics Playground, but I’m sure there are a lot of contexts outside gaming where you can deploy stealth assessment?
Absolutely, you can deploy stealth assessment in virtually any digital learning environment. The only limitation is the range of actions that the environment allows learners to perform, which determines the artifacts they can produce or the behaviors they can exhibit.
For example, I recently designed a stealth assessment of creativity for a programming learning environment called EarSketch, which teaches programming through creating music.
It's an integrated development environment (IDE) where high school students create music by coding. We assessed the creativity of students within the IDE, and I think we achieved that successfully. This wasn’t a simulation or a game—it was a real programming environment.
What are you most excited about in the assessment space?
I'm currently working on several projects that leverage generative AI for assessment, particularly in assessing creativity. I’ve been amazed at how well these models can evaluate students' work based on the instructions we provide. Combining generative AI with stealth assessment is something I’m dedicating a lot of time to.
One exciting area is the development of AI agents that can support students in games while assessing and fostering creativity—this is part of a proposal I've submitted to the National Science Foundation (NSF). These AI agents must be designed with safety in mind, ensuring that students don't become overly reliant on them and that their learning experience remains balanced.
Another aspect I’m really excited about is scalability. I’m deeply interested in finding ways to get these systems into the hands of more students and educators on a larger scale.
I hope you enjoyed this interview with Prof. Rahimi. If you are keen to dig deeper into stealth assessments (why wouldn’t you be!) but intimidated by the academic nature of the topic, I’ve taken some of Prof. Rahimi’s papers and dropped them into NotebookLM to generate this “podcast” on the topic: click here 🔊.
I will consider doing this after most articles - so please do let me know if you found this helpful!
I hope you enjoyed the last edition of Nafez’s Notes.
I’m constantly refining my personal thesis on innovation in learning and education. Please do reach out if you have any thoughts on learning - especially as it relates to my favorite problems.
If you are building a startup in the learning space and taking a pedagogy-first approach - I’d love to hear from you. I’m especially keen to talk to people building in the assessment space.
Finally, if you are new here you might also enjoy some of my most popular pieces:
The Gameboy instead of the Metaverse of Education - An attempt to emphasize the importance of modifying the learning process itself as opposed to the technology we are using.
Using First Principles to Push Past the Hype in Edtech - A call to ground all attempts at innovating in edtech in first principles and move beyond the hype
We knew it was broken. Now we might just have to fix it - An optimistic view on how generative AI will transform education by creating “lower floors and higher ceilings”.