Close

Presentation

DH7 - Conducting User-Centered Evaluations Across an Emerging Ecosystem of Prototype Bioinformatics Tools
DescriptionBackground and Significance:

This presentation discusses methods and protocols being developed to guide the creation of an emerging ecosystem of disparate, highly automated biomedical data aggregation and analytics tools built by over a dozen independent teams. These methods include user-centered evaluations, but also new methods that specifically assess how advanced automation tools work jointly with their human counterparts. Developing next-generation treatments and cures requires new forms of information services and analytics that automatically integrate biomedical data within a unified, semantically meaningful framework, with interfaces for intuitive access and exploration by diverse users. Such enabling bioinformatics tools require broad uptake, access, and use to provide any value. To ensure these tools are useful within established work contexts and enthusiastically adopted by a broad stakeholder community (including researcher, clinician, and patient user groups, with varying biomedical literacy levels), user needs and constraints must be identified and considered throughout development. Tools must also be assessed to determine whether they successfully address these user needs/constraints and then be refined accordingly when deficiencies or missed opportunities are identified.

We describe our effort to address these challenges through a multi-disciplinary approach to support health IT development, drawing on techniques from human factors, cognitive systems engineering, user-centered design, and good software development practices. In our role, we seek to be the champion of the user, by delivering user evaluation insights to tool developers—for both component tool design and cross-tool functionality—and informing the evolution of piece-part tools into a cohesive whole that addresses actual user needs and achieves critical user engagement. From this perspective, we will discuss the complexities and challenges of selecting, tailoring, and applying an overarching user-centered evaluation process in the context of a real-world and real-time technology development effort for an emerging collection of biomedical data fabric tools across varying stages of fidelity and functional completeness. To accomplish this assessment and broadly foster user-centered development of interoperable tools, this approach applies demonstrated human factors and cognitive systems engineering (CSE; Woods & Roth, 1988) methods that holistically facilitate technology requirements definition, design, development, and assessment activities. We bring new, force-multiplying resources to the broader community of tool developers by establishing a common, community-wide user-centric perspective to help individual teams build the right tools, delivered in useful and impactful ways, adopting a perspective that is complementary to, but often lacking in, technology-focused development.

Approach:

Our approach aims to provide a common set of resources and processes for grounding individual tool- and system-level requirements analyses and advocating for key user group considerations, as well as a range of user-centered evaluation and software integration and validation processes that can be flexibly applied across a diverse set of tools with differing technical scope and level of maturity. These methods are outlined below.

Supporting Requirements Definition: Drawing from established CSE and human factors methodologies, our team supports tool developers in conducting requirements definition through multiple knowledge elicitation sessions with stakeholders and representative users. We support development of community-wide operational use cases, including defining representative challenge scenarios for evaluating the usability, usefulness, and impact of the tools, applying principled CSE and human–AI interaction techniques (“Human-Machine Teaming Systems Engineering Guide,” 2018; National Academies of Sciences, Engineering, and Medicine, 2022; Oswald et al., 2022) to ensure that our user testing framework provides a common test harness to consider the strengths and weaknesses of both human and AI capabilities within representative contexts (Hollnagel & Woods, 2005).

Leading Evaluation Activities: We conduct a series of iterative, user-centered test activities, disseminating results—and associated design remediation insights—to tool developers for subsequent refinement and improvement of tools throughout the program. Evaluation methods include the following:
• Heuristic evaluations of the usability, usefulness, and impact of tools by CSE/human factors experts to assess whether tool designs conform to both good user-centered design practices (e.g., Nielson’s 10 usability heuristics; Nielsen, 2005) and human–AI teaming design practices (e.g., human–AI joint activity design heuristics; Morey et al., 2023). Our innovative approach to assessing human–AI teaming includes taking a Joint Cognitive Systems perspective and assessing how to make AI a team player to provide affordances for work conduct.

• Cognitive walk-throughs to assess usability, usefulness, and impact of tools with representative users captured through regular, facilitated interaction with a User Advocacy Committee (UAC) that we assemble and manage as a community resource. This UAC is comprised of representative researcher, clinician, and patient users across biomedical literacy levels for an initial target use case in the cancer domain. Assessment includes identifying opportunities and challenges in designs for iterative refinement. Activities include a combination of face validity checks (e.g., biomedical data SMEs’ reviews of data ontologies/architectures), to more “day-in-the-life” run-throughs articulating how the system is used (e.g., early functional designs/unified modeling language models, later interaction wireframes and design mock-ups to discuss presentation strategies and interaction methods).

• Low-level validation of software performance. To fully support user testing, we must work closely with each tool developer to understand what information their technology will consume and produce as part of a broader tool ecosystem and user workflow, and what parameters we are able to influence as part of our testing protocol. Our software test harness centers around application programming interfaces (APIs) that capture the expected inputs and outputs of each TA according to the requirements of the program. We collaborate with each tool developer to understand and refine these defined APIs to enable user centered testing of their component. This approach to user testing of tools enables evaluation consistency across tools, while an understanding of the individual inputs and outputs allows for flexibility in the ways we can affect a specific component during testing. The design and implementation of a verification and validation (V&V) pipeline assists with integration and conforming to good software design practices. The pipeline will be used to perform unit testing, integration testing, and acceptance testing of individual tool components.

• Formal summative validation studies of maturing technology solutions within the context of real user populations and work domains. Using a broad set of demonstrated evaluation techniques and metrics, we plan to assess the specific question: “Do these tools effectively support diverse biomedical data users within targeted use case contexts?” We will evaluate usability through quantitative measurements of system navigation, ease of data entry, and system feedback features. We will collect user-subjective assessments to assess information processing support and workload (e.g., NASA Task Load Index; Hart & Staveland, 1988), the perceived usability, as well as the aesthetics of the controls and displays (using validated constructs such as the SUS; Brooke, 1996). Our evaluation of usefulness and impact will focus on intuitiveness, transparency, directability, joint activity, and adoption of the user interface and will include real-time observations of task performance and subjective assessments (e.g., technology self-efficacy and adoption; “Technology Readiness Index Primer,” 2014; Ulfert-Blank & Schmidt, 2022; Yi, Tung, & Wu, 2003). To specifically assess the ability of tools to support the joint activity of the human–AI team, we will draw on joint cognitive systems (JCS) perspectives. The joint activity testing (JAT) method, first proposed in Morey et al. (Morey, Marquisee, Gifford, Fitzgerald, & Rayo, 2020), will be used to enable interpolation and extrapolation beyond limited testing sets. Because no testing set can ever be fully representative or exhaustive of the types and degrees of challenges a system will face during operations, evaluation methods must be able to reliably extrapolate insights beyond the boundaries of limited testing sets or else risk that implemented technology will exhibit far more brittleness than designers imagined (e.g., Roth et al., 1987; Beede et al., 2020).
Event Type
Poster Presentation
TimeTuesday, April 14:45pm - 6:15pm EDT
LocationFrontenac Foyer