Doug Hoover's Master's Thesis

Automatic Stain-Based Classification in Biological Images Using Adaptive Archetype Color Estimation

Abstract

1. Introduction

The problem of classifying stained objects on microscope slides is pervasive. It is required to draw correct conclusions both in biomedical research, in clinical pathology (e.g., breast biopsy quantitation), assays such as the unscheduled DNA synthesis assay in genetic toxicology. Currently, many of these classification tasks are carried out manually -- an extremely repetitive task that requires well-trained personnel. There is increasing need for automation of these procedures, to reduce the cost, or to improve the consistency or speed of the analysis.

Automation of these tasks has become much more practical with the widespread availability of color video microscopes, and inexpensive, powerful computers with image capture capability. Image processing algorithms can now be used to automate, or at the very least assist, a variety of bioassays, pathology studies, and related applications. While the human observer is often able to discriminate colors from video images, the design of fully automatic image analysis algorithms for this purpose has proven difficult. Even when methods are successfully developed that solve a particular problem, they frequently require considerable tuning to work for a different application -- if they can be made to work at all. A technique that could be used to address many different cytological problems would make it much easier to put together new systems, and would therefore be very useful.

The problem of color-based automatic segmentation can often be reduced to intensity based segmentation schemes: carefully designed color filters may enable optical separation of the objects of interest. Sometimes, however, there is significant spectral overlap between the objects of interest; these cases are not as amenable to the use of filter-based techniques. (The original problem that we set out to solve, the classification of ER/PR slides used to select appropriate treatments for breast carcinomas, was one such case.) Also, the selection of appropriate color filters can be a non-trivial task, and filters that work well for one problem are not likely to be as good for the next; they tend to be application-specific.

Many classification problems of interest were originally structured for manual differentiation. The human eye has remarkable color discrimination capabilities, which have proven difficult to match with image processing algorithms. Staining procedures are often not calibrated precisely on a quantitative basis, since the human viewer has little difficulty adjusting to such variations. Many of the stains that are used (e.g., hematoxylin) are naturally derived substances that exhibit significant variability. Furthermore, much of the commonly available color imaging equipment, such as video cameras and monitors, is optimized for the human perceptual system, rather than being consistently calibrated for numerical computer image analysis. Hence, there is a need for image processing algorithms that are robust enough to handle the variability that is often encountered.

The image processing technique that we present here seeks to address all of these issues, and provide a solution to the problem of stain-based classification that is robust and widely applicable. It is designed to generate good results for any class of microscope slides stained with any two stains. This is a very common situation: frequently a counterstain is used to make all the significant objects on a slide visible, and then a second stain is applied to mark particular objects of interest. When this technique is used, it usually produces many objects that are stained with mixtures of both stains. Our algorithms handle such mixture situations particularly well.

One key feature of our classification methodology is that it is an "unsupervised" technique. Many automatic image processing algorithms require a "training" phase, where a human expert must "supervise" and train the algorithm by giving it examples of the specific things to be recognized (in this case, colors), or by pointing out classification errors. The algorithm then must be re-trained for every new application. Even worse, when there is significant variability (such as there is in biological stains), frequent retraining may be required even to handle a single problem. Unsupervised techniques such as ours, however, can generate good results without a lengthy training phase. They are able to figure out any necessary parameters automatically, given only minimal information. This makes them far more resilient to variability, and gives them far wider (and far easier) applicability to new problems.

The problem of unsupervised color-based classification is relatively straightforward when the colors are sufficiently spectrally distinct. The colors then form well-separated clusters in "color space", and any of several well-known unsupervised clustering algorithms, such as the "k-means" or "nearest-neighbors" techniques, can be applied. Classical clustering algorithms do not work, however, if such clusters do not exist. We have found that this is often the case with the colors on stained microscope slides; presumably this is because there are varying degrees of mixtures of two or more stains. Some researchers, such as Ranefall¹, have sought alternative solutions. We present here another novel approach, which we have found gives good results for a wide array of specimens, under even difficult conditions.

Back to Doug's Home Page