How AI Helps in Whole Slide Image Analysis for Cancer Detection
For many years one of the main means of cancer diagnosis was biopsy. A doctor would take a sample of the tissue in question, and a pathologist, among other tests, would make a thorough examination of it under the microscope. They would look for the affected cells, or for some other signs of developing disease. That being said, this process is and was quite tedious, as the abnormal cells are minuscule compared to the enormous image of the tissue – the actual whole slide image.
Modern histopathology involves multiple methods to analyze human cells, including tissue scans and anamnesis data. Today, healthcare organizations rely on Whole Slide Imaging or WSI to save a digital scan of the entire histology result. WSI scans help doctors to analyze larger areas to detect malignant cells or other mutations in a specific organ tissue.
However, the process of working with WSI is enormously time consuming, since such slides usually have large resolutions. Running through them manually is a tough and time consuming task. Given that this problem exists, the MobiDev team challenged themselves to search for an AI pathology solution capable of working with WSI and supporting digital healthcare workflows.
Image credits to National Cancer Institute.
What is WSI?
WSI image is a high-resolution, large format scan of human tissue. WSI images are also called microscope slides, as they represent an enlarged image of cells and subcellular structure. Such slides are obtained by making multiple scans of different areas of a tissue sample and montaging them into one large image. As an output, we receive a high quality digital scan that can be zoomed in up to 40 times without losing much detail.
WSI slide magnification
Compared to the traditional microscopic analysis of histology, WSI provides a number of advantages:
- Whole slide images usually come in giant resolutions like 100,000 x 100,000 pixels and bigger, which gives a lot of clarity when visually inspecting tissue areas.
- WSI is a digital scan that covers much larger areas all at once and is easier to zoom in/zoom out, compared to a microscopic view.
- The image can be sent between the healthcare experts as a digital copy. Additionally, WSI scans can’t deteriorate like specimens on a glass slide.
However, in terms of operability, WSI analysis is not so great due to the time and effort needed during the manual examination. In real world practice, a doctor would sit before the image scanning by eye, zooming in and out for hours. In the case of manual examination, a whole slide should be examined to spot suspicious cell structures or mutations, because skipping some areas may result in medical error.
This is a general problem for healthcare workflows since long processes prevent quick diagnosis and efficient treatment, which are critical for patients with cancer and for histopathology as a whole. With the advances to digital pathology and fluctuations in healthcare regulatory requirements, the application of artificial intelligence became possible to use to process clinical data. Given the fact that WSI is a gigapixel-sized image document that contains tons of unstructured data, the problem with manual examination can be addressed through machine learning.
Applying AI for WSI scans processing
Independent of the content in the image, WSI allows for the image to be broken down into features which are used for training machine learning models that are capable of running object detection, recognition, and classification tasks. These tasks are handled with computer vision that relies on optical sensors or digitally processes images to extract meaningful information.
However, since we don’t know where the potential tumor cells are, we can’t perform one essential ML task: data labeling. Labels are required to tell the model what these or those cells are for it to learn the difference between the normal and damaged tissue. To solve this, convolutional neural networks, or CNNs, are used.
CNN requires little pre-processing of data input, as they are mainly used for visual data to find patterns autonomously. Applied to WSI scans, CNN is capable of extracting features out of weakly labeled, or completely unlabeled data. Extracted features then can be used to represent the whole slide in compact form, which can be helpful for image storing or grouping similar items.
However, minor pre-processing steps are still required, since it’s too CPU-demanding for the model to work with the whole slide. Due to the giant image size, the image is most likely to be sliced into equally-sized square images. So now, let’s look at the example in practice.
MobiDev’s Research: WSI processing through CLAM approach
With their extensive expertise in computer vision, MobiDev approached the WSI processing system design through clustering. This is a constrained attention multiple instance learning (CLAM) approach. As a training dataset, publicly available Clear Cell Renal Cell Carcinoma dataset was used.
To prepare data for training, images need to be cut into patches (equally sized square images), because the initial image size is too large. Also, the data used for the experiment was weakly labeled, meaning there were no labels for each patch of images.
The task was to create a model capable of analyzing WSI scans and then highlight regions of interest, or in other words, suspicious cells, found in the tissue. As long as healthcare is based on critical decisions, the role of AI app development in pathology is to support the doctor with hints to speed up the diagnostic process.
Region of interest in the cell structure
However, the prediction should also be descriptive, as long as the user has to understand how the model derived this or that result. This is what’s called AI explainability as a feature of modern AI pathology systems. You can read more about the forms and approaches in AI explainability in decision-making applications.
CLAM is a data-efficient approach to working with weakly-labeled whole-slide imagery. The essence of it can be broken down into a number of steps:
- On the WSI, the tissue/cell sample is segmented from the background using computer vision algorithms.
- Within the segmented area, the tissue is split in equally-sized square patches (usually 256px).
- Patches are encoded once by a pretrained CNN into a descriptive feature representation. During training and inference, extracted patches in each WSI are passed to a CLAM model as feature vectors.
- For each class, the attention network ranks each region in the slide and assigns an attention score based on its relative importance to the slide-level diagnosis. Attention-pooling weighs patches by their respective attention scores and summarizes patch-level features into slide-level representations, which are used to make the final diagnostic prediction.
- The attention scores can be visualized as a heatmap to identify ROIs and interpret the important features.
Left: direct visualization of the attention scores. Each pixel represents a 256px patch.
Right: attention scores blended onto the WSI. The colored regions represent the most attended parts of the sample.
As we can see, the CLAM’s attention network is quite interpretable. It outputs “importance scores” for each patch, so those can be visualized as a heatmap. If you take a closer look at the WSI on the right, you may see the affected cells colored blue, meaning the network determined they would affect the prediction the most.
Using neural networks to process weakly labeled WSI images provides a couple of advantages in this case. First of all, we can discover clinically relevant, but previously unrecognized morphological characteristics that have not been used in visual assessment by pathologists. Secondly, the model can recognize features that are beyond human perception, which can potentially provide more accurate diagnostic results and have a positive impact for surgical or therapeutic results.
Addressing the WSI complexity and interpretability
This approach addresses important WSI processing difficulties. For example, to handle the large size of the slide, we segment the actual tissue from the scan background, trimming unnecessary information. Then, we split the remaining ROI into tiles that are treated equally to fit them into the GPU.
Also, another difficulty is labeling. It takes hours for a pathologist to look through the slide and diagnose, let alone label it, i.e. identifying affected cells. On the contrary, CLAM is designed to work with this kind of weakly-labeled data, where only the slide-level labels are known (i.e. diagnosis.) CLAM itself determines which regions of the tissue affect the predicted diagnosis.