GAZER : Model-active human-centered artificial intelligence for medical visual assistance

Overview

overview

Human-centered (HC) artificial intelligence (AI) collaborates human doctors and AI systems, that keeps human decision power (responsible) and improves their decision outcome, demonstrating great potential in medical visual assistance. During the interaction, it is able to assists the doctors to recognize the objects in medical images, thus enhancing their capabilities and empowering them achieve their clinical goals. Especially, with the development of foundation models, e.g., Segment Anything Models (SAM) and LLaVA , their emergence and homogenization capabilities enable the HCAI to be practiced in a wide range of medical scenarios through interaction. Therefore, it is poised to broadly reshape medical imaging and showing a promising future in energizing the capability of doctors.

Highlights

  • It provides an AI-driven human-AI collaboration that empower doctors' medical ability without interference with their clinical workflow.
  • We propose a novel gaze-prompted segment anything model, GAZER, for model-active interaction.
  • A plug-and-play gaze point filter (GPF) module that stimulates the gaze-prompt-based emergence ability of the foundation models without any additional training.
  • We present a gaze prompt learning (GPL) that learns to cognize the gaze maps and understand the human intentions in our GAZER, thus assisting medical visual practices in professional scenarios.

GAZER is flexibly and seamlessly applied in a variety of clinical scenarios

overview
  • Assist doctors in positioning and prompt doctors with relevant information during ultrasound scanning.
  • The lesion was located during the interventional procedure.
  • Observation inexperienced doctors gaze at the area for medical education.
  • Assist doctors in the location and diagnosis of multiple objects in the image.
  • The framework of our GAZER

    overview

  • GAZER embeds the gaze as the prompt into the SAM \cite{kirillov2023segment} architecture, for interactive semantic regions extraction on medical images according to the gaze. It has two modes (GAZER-GPF and GAZER-GPL) for the superiority of both universality and professionality.
  • GAZER-GPF module encodes the gaze points into the model inference process, so that it is able to embed the gaze into the point prompt encoder of SAM framework without any additional training.
  • GAZER-GPL trains a gaze map prompt encoder in the SAM's segmentation learning to understanding human intentions from their gaze maps, achieving reliable segmentation in professional medical scenarios.
  • GAZER-GPL is able to understand the distinction between different gaze maps

    GAZER-GPL is able to better understand and distinguish different gaze maps as prompt. We obtain the prompt embeddings from different models prompt encoder, then dimensionality reduction was applied via t-SNE and visualization display. With fine-tune on GAZER, we found that GAZER-GPL could still effectively distinguish between various objects.
    overview

    BibTeX

    If you find our project or pre-trained parameters useful in your research, please cite: