New Initiative Evaluates Fidelity of Simulations Used in Training

Virtual reality (VR)-based simulation systems have become a crucial training tool across a wide range of mission areas within the U.S. Department of Defense (DoD). Unfortunately, a lack of standardization for defining different levels of simulation fidelity presents a significant challenge for developing, purchasing, and evaluating the effectiveness of these systems.

Evaluating fidelity of simulations — GTRI Research Scientist Dylan Bush demonstrates an F-16 simulation that was used to evaluate the Inter-rater Reliability of the scale of the simulation fidelity scale. (Credit: Christopher Moore)

A new approach to assessing simulation fidelity being developed by human factors researchers at the Georgia Tech Research Institute (GTRI) could help address that challenge with both a framework and rating scale that decompose training tasks into specific task elements for categorization across multiple dimensions of simulation fidelity. The standardized approach to quantifying simulation fidelity could facilitate efforts to broadly assess the effectiveness of training programs and support the development of system requirements for future simulation-based training efforts.

“The overall aim of this solution is to provide a standardized and repeatable approach to categorizing and defining simulation fidelity that goes beyond arbitrary terms such as ‘low-fidelity’ or ‘high-fidelity,’” said Dylan Bush, a GTRI research scientist who is leading the project. “Without explicit definitions of different simulation technologies, it is difficult to analyze data from studies evaluating training programs in the aggregate.”

The research team’s work to evaluate the new capability will be described at the Human Factors and Ergonomics Society’s (HFES) 66th International Annual Meeting in October. The work has been supported by GTRI’s Independent Research and Development program.

Simulator-based training allows warfighters to repeatedly practice potentially dangerous training scenarios with significantly reduced risk, more convenience, and lower cost. But these VR-based simulations are often developed or acquired without a full understanding of the extent to which the tasks being trained are suitable for, or would benefit from, the training program, Bush said. Without objective criteria for evaluating simulation fidelity, it can be difficult to assess the benefits that can be derived from the training – and the level of realism necessary to create effective simulations.

Development of the new approach began with a three-step process that: 1) broke down the simulations into individual tasks; 2) applied principles of cognitive psychology to divide fidelity concepts into perception, cognition, and action components; and 3) developed the Simulation Fidelity (SiFi) scale for evaluating how well the simulation matches real-world components.

The project builds on earlier work aimed at objectively evaluating the realism of simulations.

“You can’t rate the fidelity of a system without looking at it through the context of the tasks that it needs to support,” Bush said. “While it may seem counterintuitive, fidelity as a construct is really centered on what information and interactions the user needs to complete the task.”

The GTRI system relies on human evaluators to rate both the physical elements of fidelity: visual, auditory, and tactile, as well as cognitive aspects including human interaction and resulting system behavior. Those tasks are rated on a six-point scale that measures how well each simulation element compares to the real-world task it is attempting to simulate.

The ratings range from 0, meaning an element is not present, up to 5, meaning an element is indistinguishable from the real-world form it is attempting to simulate. Ratings for each element are aggregated together to create an overall score.

To evaluate the Inter-rater Reliability of the scale, or how consistently different raters provide similar ratings to the same element, the researchers enlisted help from two former F-16 pilots from the GTRI research staff who completed a series of flight maneuvers in an F-16 VR simulator. After completing the maneuvers, each rater used the scale to provide ratings to 117 task elements.

The results of the inter-rater reliability analysis indicated a strong degree of reliability (k = 0.81), but also identified areas where improvements could be made in certain components of the scale. Bush and colleague Andrew Braun, also from GTRI, would like to conduct additional research using a larger group of raters, and potentially refining the definitions used in the scale.

“These additional analyses would not only further investigate the reliability of the scale, but would also investigate how well the scale can be generalized across different simulation contexts,” the authors wrote in their HFES paper.

Beyond supporting the ability to evaluate the effectiveness of simulations, SiFi could help human factors researchers aggregate evaluations of different studies, allowing them to learn more about the impact of training simulations. Improving standardization could also help DoD purchasing personnel improve the specifications for future simulation projects.

Writer: John Toon (john.toon@gtri.gatech.edu)
GTRI Communications
Georgia Tech Research Institute
Atlanta, Georgia USA

MORE 2022 ANNUAL REPORT STORIES

MORE GTRI NEWS STORIES

Related News

Georgia Tech/GTRI Researchers Recognized as DARPA Risers

GTRI's SEEDLab Ground Zero for Lunar Flashlight Project

Raj Vuchatu Leads by Example at GTRI and Vibha Atlanta