Ensuring Emergency Management Training Translates into Action

As disasters become more frequent and severe in the U.S., it’s more important than ever to prepare. Every year, significant resources are poured into training programs for disaster management professionals to practice real-life scenarios and tools focused on disaster preparation, response and recovery.

But the big questions are: How can we tell if these trainings are actually working? Are participants directly applying what they’ve learned to their daily work practices or during real disaster responses?

In a new paper, researchers Thomas Chandler, Richard Hendra, Shuyang Huang, Josh DeVincenzo and Yaxuan Yang at Columbia’s National Center for Disaster Preparedness and the New School tackle these questions by using the Kirkpatrick Model to measure the effectiveness of their disaster education training. They offer insights in the Q&A below.

What is the Kirkpatrick Model?

The Kirkpatrick Model

Chandler: The Kirkpatrick Evaluation Model is a foundational tool in the field of training and instructional design. Its approach helps organizations systematically evaluate the effectiveness of their training programs, ensuring that investments in employee development lead to meaningful improvements in performance and organizational success. While it has substantial limitations, when used thoughtfully and in combination with other evaluation methods, the model provides valuable insights into the impact of training initiatives, particularly in the disaster risk management sector. It was developed by Donald Kirkpatrick in the 1950s, and consists of four levels, each building on the previous one, to provide a comprehensive evaluation of training outcomes. The focus of our paper is on Level 3, which involves knowledge transfer from a training environment to a real-world setting.

Why is the Kirkpatrick Model used to evaluate training?

Hendra: The Kirkpatrick model has stood the test of time because of its approach to evaluating training in relation to actual skills and practice. It is anchored in a vision of training success, and we can all agree that professional training is not only about having a good experience. It is about learning important skills, putting them to work and changing core outcomes for the people we serve. It can uncover obstacles that prevent participants from applying new skills or knowledge, allowing organizations to address issues such as a lack of resources, an inability to learn key concepts due to the training modality or insufficient follow-up.

Which Kirkpatrick levels are used for evaluation training?

Huang: The four levels of the Kirkpatrick Model vary in importance and complexity. Levels 1 and 2 are relatively straightforward to measure, focusing on participants’ immediate reactions to the training and short-term learning outcomes. However, Level 3 is more challenging and arguably more critical, as it evaluates how participants apply what they’ve learned in real-world settings. This level moves beyond subjective impressions of the training or pre/post-test scores to examine tangible behavioral changes in the workplace. Level 4, while an admirable goal, is the hardest to measure because it requires pinpointing how much a specific training program has contributed to improving an organization’s overall effectiveness, often influenced by multiple external factors.

What are some challenges with using a Kirkpatrick Level 3 evaluation?

Chandler: When learners are contacted to reflect upon how training may have impacted their job performance, there are a lot of challenges in collecting and analyzing the data. Respondents may tell interviewers what they want to hear, which is called social desirability bias. They may misstate the sequence of events, called recall bias, or there could be a very low response rate. And, just because behavioral changes may have occurred after training, it’s difficult to determine if the training was really the cause or if other societal factors played a larger role. For instance, the workforce shift to telecommuting during and after the COVID-19 pandemic could affect employee behavior outcomes that are larger than a given training effect. The importance of getting the data analysis right cannot be overstated because it can provide a solid justification for the implementation of emergency management training programs and determine whether such programs are really worth the time, effort and cost.

What are some ways to statistically control factors such as selection bias or response bias?

Hendra: In the paper, we talk about the craft of balancing design and data collection constraints to get to a place where we can more reliably measure Kirkpatrick Level 3 outcomes. Even if a training seems to have made a positive impact, we don’t really know for sure unless we consider the counterfactual. That is, what would have happened if the training had never occurred? This is the fundamental problem of program evaluation. In an ideal world, we could create a perfect statistical model to analyze this, including all the differences between responders and non-responders. But it is impossible to measure all the psychological and behavioral factors involved. We have to make do with other methods, including randomized controlled trials, which are powerful but sometimes difficult to implement, or quasi-experimental designs such as interrupted time series and matching, which are all described in this paper. Although randomized controlled trials are the most rigorous, the quasi-experimental methods are a bit easier to employ for training organizations dealing with major deadlines, financial constraints and unpredictable enrollment. There are real tradeoffs between rigor and feasibility to consider in these design and data collection approaches.

What are some approaches to enhance the validity and participation of the Kirkpatrick Level 3 evaluation?

Huang: As mentioned earlier, the COVID-19 pandemic and the corresponding rise of distance learning have generated increased investment in learning management systems among training organizations. Many of these systems are able to automatize online Level 3 evaluations, enabling the seamless dissemination of surveys to a large number of participants. This automation not only increases the response rate but also mitigates challenges such as recall and social desirability biases. In this paper, we also discuss how learning management system integration can enhance the evaluation process by offering personalized feedback, reinforcing legitimacy through branded communications and enabling the collection of timely, actionable data. These improvements help organizations continually refine their training programs based on comprehensive and reliable participant insights.

Do you have final thoughts regarding using the Kirkpatrick Model?

Chandler: The Kirkpatrick evaluation model has been widely used and effective for more than 70 years because it helps us better determine whether training programs are having a positive real-world impact, which is extremely important in the area of disaster risk management. If a training program is focusing on life-saving measures, for instance, we have to do everything we can to ensure it’s really adding value. But there are also a number of limitations to consider. When it comes to Level 3 evaluations, we suggest that program managers and instructional designers consider the potential biases and data collection challenges while also planning for approaches that best suit the needs of their organization. To help readers get the most out of it, we’ve outlined some key steps for using the model effectively. You can view the white paper for more details.

Thomas Chandler is the deputy director of the National Center for Disaster Preparedness and an associate member of the Columbia Climate School faculty. He is also the director of the National Center for Disaster Preparedness FEMA training projects.

Richard Hendra is an instructor of Statistics and Methods at the School of Public Engagement at the New School. He has recently taught Quantitative Methods, Advanced Quantitative Methods, and Quantitative Research Design.

Shuyang Huang is a staff associate working on data management and analysis at the National Center for Disaster Preparedness. She is currently leading Kirkpatrick Level 3 evaluation efforts on behalf of the center.