The convergence of digital transformation in biomedicine with the current generative AI revolution presents an unprecedented opportunity to advance precision health rapidly. At the forefront of this exciting frontier is digital pathology, which has revolutionized cancer care through the routine availability of whole-slide imaging. These high-resolution digital images provide crucial information for deciphering the tumor microenvironment, particularly important for precision immunotherapy. However, digital pathology presents unique computational challenges, as standard gigapixel slides are thousands of times larger than typical natural images, posing difficulties for conventional vision transformers.

To address these challenges, Microsoft has introduced GigaPath, a novel vision transformer designed to model whole slides by leveraging dilated self-attention to manage computation. Developed in collaboration with Providence Health System and the University of Washington, Prov-GigaPath is an open-access whole-slide pathology foundation model pretrained on over one billion pathology image tiles, a groundbreaking achievement in the field. Notably, Prov-GigaPath demonstrates state-of-the-art performance on cancer classification and pathomics tasks, highlighting the importance of whole-slide modeling on real-world data for advancing patient care and clinical discovery.

GigaPath adopts a two-stage curriculum learning approach, comprising tile-level pretraining using DINOv2 and slide-level pretraining using LongNet with masked autoencoder. By adapting dilated attention from LongNet to digital pathology, GigaPath effectively captures long-range dependencies while maintaining computational tractability. This enables Prov-GigaPath to achieve remarkable performance on cancer subtyping and pathomics tasks, outperforming competing models on numerous benchmarks.

Furthermore, GigaPath demonstrates its versatility by excelling in vision-language tasks, such as zero-shot cancer subtyping and gene mutation prediction, through the incorporation of pathology reports. This innovative approach leverages Prov-GigaPath as the whole-slide image encoder and PubMedBERT as the text encoder, showcasing the potential of multimodal generative AI for precision health.

Article written by Hoifung Poon