Presentation
When Usability Undermines Explainability: A Case Study Designing AI-Enabled Tools in Healthcare
DescriptionIntroduction:
As advanced technologies, like artificial intelligence (AI) and machine learning (ML), are increasingly integrated into healthcare tools, interface designers are faced with a growing list of considerations. Designing AI-enabled tools requires supporting additional AI-specific interactions (e.g., explainability or observability of the AI system) which can extend beyond traditional usability considerations. These additional demands can introduce new use cases and context for those designing and evaluating interfaces. For example, AI-enabled tools must support users’ ability to not only understand what the AI is doing and why, but also when and how the AI might be wrong (Rayo et al., 2020). These parallel efforts are often led by separate disciplines using different sets of principles: human factors professionals often focus on usability and utilize design heuristics (Nielsen, 1994) while computer scientists often focus on explainability and utilize techniques for revealing AI/ML models (Adadi & Berrada, 2018). The efforts of both are expected to seamlessly integrate into the development of AI-enabled tools and interfaces which are both usable and understandable. However, principles of usability and explainability can sometimes conflict and present challenging trade-offs for interface designers.
In this research, we illustrate some of the challenges facing designers of AI-enabled tools, like conflicts between usability and explainability. We analyzed verbal protocols from 30 practicing nurses interacting with an AI-enabled decision support tool (Morey & Rayo, 2024) and synthesized a series of illustrative vignettes. Our findings reveal how an over-emphasis of usability principles can inadvertently undermine the effectiveness of AI explainability, and vice versa. These findings highlight the need for a holistic approach to designing AI-enabled tools to effectively address the additional considerations introduced by adding AI capabilities and close the gap between traditional usability and explainability.
Methods:
We analyzed verbal protocols from prior research (Morey & Rayo, 2024) involving 30 practicing nurses interacting with an AI-enabled decision support tool. Nurses were presented with a series of patient cases, each including a brief background about the patient, a visualization of patient information (e.g., vitals, lab results, etc.), two predictive AI/ML models, and a visual explanation of AI/ML predictions. Verbal protocols were collected while nurses were instructed to think aloud as they were making sense of what was happening with the patients. Nurses were also explicitly prompted to respond to the question: “Do you think the machine is concerned about this patient? Why or why not?” Transcripts from the audio-recorded sessions were then coded with an inductive approach consistent with grounded theory that begins with line-by-line coding using the language of the nurses to describe the action, intention, and assumptions that occur in each line of data and iterates into more generic processes and patterns. In this paper, we present a subset of findings relevant to the usability of both the patient data visualizations and AI/ML models.
Vignettes:
Three vignettes emerged from empirical observations on how practicing nurses interacted with the AI-enabled display. These vignettes highlight some unanticipated outcomes when the design strategy leans heavily on interface usability, such as focusing on consistency and conventions, or when the design strategy leans heavily on promoting correct attribution of algorithm concern.
Vignette 1: Interpreting whether the machine is concerned
Cues participants used to interpret the machine’s concern can be divided into three categories: (1) directly using the machine prediction as machine concern levels; (2) inferring the machine’s concern level from the presence and amount of red marks (e.g., alarms, explainability annotations) on the interface without referencing the machine prediction; (3) inferring the machine’s concern level from all other information on the interface without referencing the machine prediction or red marks.
The observed heterogeneity in what cues nurses rely on to interpret machine concern level demonstrates nurses may use different cues to make sense of the machine than anticipated by designers. This suggests designers should consider the whole integrated display when evaluating the effectiveness of explainability techniques.
Vignette 2: Interpreting red marks on the display
When answering what they think the machine is concerned about, nurses commonly mentioned red marks on the interface as factors of machine concern, even those which were outside of the two data sources (heart rate and pulse oxidation) which the AI algorithm utilized. Nurses frequently attributed other aspects of the display that were red (e.g., critical lab values, visual alarms, etc.) to the machine’s concern, despite being instructed that these data sources were beyond the scope of the algorithm purview.
Although using red to draw attention to concerning and out-of-range data sources was a strategy explicitly chosen by the designer to enhance the consistency and coherency of the interface in adherence with usability heuristics (Nielsen, 1994), the use of red marks throughout the interface negatively impacted nurses’ ability to correctly identify data which contributed to the AI prediction.
Vignette 3: Location of cues that attribute to machine concern
While data sources on the patient interface were frequently observed to be incorrectly attributed to what the AI is capable of incorporating, participants rarely identified information from the patient background as contributing to machine concern. This information was physically separated from the interface where the AI prediction was shown, which may have reduced nurses’ tendency to misattributeAI concern.
Although separation of information reduced misattribution, this strategy has notable risks, including reducing visual momentum, which could hinder overall task performance. Our findings suggest there might be a tension between integrated information and risks of unintentional connection or attribution.
Discussion:
Our findings suggest that introducing AI to a decision display introduces additional cognitive functions to support, which can create trade-off decisions in interface design. Single-minded pursuit of either usability or explainability can undermine the other. At minimum, interface designers and AI developers must be aware of these potential unintended consequences. Our results suggest that the design and development of effective integrated AI-enabled tools likely requires human factors professionals and AI developers to collaborate early in the design phase process to balance these potential trade-offs. These findings also highlight the need for new synergistic ways to combine AI/ML design guidance and traditional usability practice into a holistic design process which supports joint human-AI activity.
References:
Adadi, A., & Berrada, M. (2018). Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052
Morey, D. A., & Rayo, M. F. (2024). Situated Interpretation and Data: Explainability to Convey Machine Misalignment. IEEE Transactions on Human-Machine Systems, 54(1), 100–109. https://doi.org/10.1109/THMS.2023.3334988
Nielsen, J. (1994). Enhancing the explanatory power of usability heuristics. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 152–158.
Rayo, M. F., Fitzgerald, M. C., Gifford, R. C., Morey, D. A., Reynolds, M. E., D’Annolfo, K., & Jefferies, C. M. (2020). The Need for Machine Fitness Assessment: Enabling Joint Human-Machine Performance in Consumer Health Technologies. Proceedings of the International Symposium on Human Factors and Ergonomics in Health Care, 9, 40–42. https://doi.org/10.1177/2327857920091041
As advanced technologies, like artificial intelligence (AI) and machine learning (ML), are increasingly integrated into healthcare tools, interface designers are faced with a growing list of considerations. Designing AI-enabled tools requires supporting additional AI-specific interactions (e.g., explainability or observability of the AI system) which can extend beyond traditional usability considerations. These additional demands can introduce new use cases and context for those designing and evaluating interfaces. For example, AI-enabled tools must support users’ ability to not only understand what the AI is doing and why, but also when and how the AI might be wrong (Rayo et al., 2020). These parallel efforts are often led by separate disciplines using different sets of principles: human factors professionals often focus on usability and utilize design heuristics (Nielsen, 1994) while computer scientists often focus on explainability and utilize techniques for revealing AI/ML models (Adadi & Berrada, 2018). The efforts of both are expected to seamlessly integrate into the development of AI-enabled tools and interfaces which are both usable and understandable. However, principles of usability and explainability can sometimes conflict and present challenging trade-offs for interface designers.
In this research, we illustrate some of the challenges facing designers of AI-enabled tools, like conflicts between usability and explainability. We analyzed verbal protocols from 30 practicing nurses interacting with an AI-enabled decision support tool (Morey & Rayo, 2024) and synthesized a series of illustrative vignettes. Our findings reveal how an over-emphasis of usability principles can inadvertently undermine the effectiveness of AI explainability, and vice versa. These findings highlight the need for a holistic approach to designing AI-enabled tools to effectively address the additional considerations introduced by adding AI capabilities and close the gap between traditional usability and explainability.
Methods:
We analyzed verbal protocols from prior research (Morey & Rayo, 2024) involving 30 practicing nurses interacting with an AI-enabled decision support tool. Nurses were presented with a series of patient cases, each including a brief background about the patient, a visualization of patient information (e.g., vitals, lab results, etc.), two predictive AI/ML models, and a visual explanation of AI/ML predictions. Verbal protocols were collected while nurses were instructed to think aloud as they were making sense of what was happening with the patients. Nurses were also explicitly prompted to respond to the question: “Do you think the machine is concerned about this patient? Why or why not?” Transcripts from the audio-recorded sessions were then coded with an inductive approach consistent with grounded theory that begins with line-by-line coding using the language of the nurses to describe the action, intention, and assumptions that occur in each line of data and iterates into more generic processes and patterns. In this paper, we present a subset of findings relevant to the usability of both the patient data visualizations and AI/ML models.
Vignettes:
Three vignettes emerged from empirical observations on how practicing nurses interacted with the AI-enabled display. These vignettes highlight some unanticipated outcomes when the design strategy leans heavily on interface usability, such as focusing on consistency and conventions, or when the design strategy leans heavily on promoting correct attribution of algorithm concern.
Vignette 1: Interpreting whether the machine is concerned
Cues participants used to interpret the machine’s concern can be divided into three categories: (1) directly using the machine prediction as machine concern levels; (2) inferring the machine’s concern level from the presence and amount of red marks (e.g., alarms, explainability annotations) on the interface without referencing the machine prediction; (3) inferring the machine’s concern level from all other information on the interface without referencing the machine prediction or red marks.
The observed heterogeneity in what cues nurses rely on to interpret machine concern level demonstrates nurses may use different cues to make sense of the machine than anticipated by designers. This suggests designers should consider the whole integrated display when evaluating the effectiveness of explainability techniques.
Vignette 2: Interpreting red marks on the display
When answering what they think the machine is concerned about, nurses commonly mentioned red marks on the interface as factors of machine concern, even those which were outside of the two data sources (heart rate and pulse oxidation) which the AI algorithm utilized. Nurses frequently attributed other aspects of the display that were red (e.g., critical lab values, visual alarms, etc.) to the machine’s concern, despite being instructed that these data sources were beyond the scope of the algorithm purview.
Although using red to draw attention to concerning and out-of-range data sources was a strategy explicitly chosen by the designer to enhance the consistency and coherency of the interface in adherence with usability heuristics (Nielsen, 1994), the use of red marks throughout the interface negatively impacted nurses’ ability to correctly identify data which contributed to the AI prediction.
Vignette 3: Location of cues that attribute to machine concern
While data sources on the patient interface were frequently observed to be incorrectly attributed to what the AI is capable of incorporating, participants rarely identified information from the patient background as contributing to machine concern. This information was physically separated from the interface where the AI prediction was shown, which may have reduced nurses’ tendency to misattributeAI concern.
Although separation of information reduced misattribution, this strategy has notable risks, including reducing visual momentum, which could hinder overall task performance. Our findings suggest there might be a tension between integrated information and risks of unintentional connection or attribution.
Discussion:
Our findings suggest that introducing AI to a decision display introduces additional cognitive functions to support, which can create trade-off decisions in interface design. Single-minded pursuit of either usability or explainability can undermine the other. At minimum, interface designers and AI developers must be aware of these potential unintended consequences. Our results suggest that the design and development of effective integrated AI-enabled tools likely requires human factors professionals and AI developers to collaborate early in the design phase process to balance these potential trade-offs. These findings also highlight the need for new synergistic ways to combine AI/ML design guidance and traditional usability practice into a holistic design process which supports joint human-AI activity.
References:
Adadi, A., & Berrada, M. (2018). Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access, 6, 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052
Morey, D. A., & Rayo, M. F. (2024). Situated Interpretation and Data: Explainability to Convey Machine Misalignment. IEEE Transactions on Human-Machine Systems, 54(1), 100–109. https://doi.org/10.1109/THMS.2023.3334988
Nielsen, J. (1994). Enhancing the explanatory power of usability heuristics. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 152–158.
Rayo, M. F., Fitzgerald, M. C., Gifford, R. C., Morey, D. A., Reynolds, M. E., D’Annolfo, K., & Jefferies, C. M. (2020). The Need for Machine Fitness Assessment: Enabling Joint Human-Machine Performance in Consumer Health Technologies. Proceedings of the International Symposium on Human Factors and Ergonomics in Health Care, 9, 40–42. https://doi.org/10.1177/2327857920091041
Event Type
Oral Presentations
TimeMonday, March 312:30pm - 3:00pm EDT
LocationPier 2/3
Digital Health (DH)


