Scientists have developed an image analysis system they claim can accurately determine breast cancer prognosis by analyzing thousands of automatically acquired measurements of epithelial and stromal features from tissue microarray (TMA) images.
The system, called C-Path, was developed using a machine learning approach to teach it how to classify stromal and epitethelial cancer tissue on the microarray images. The system then identifies and analyzes morphometric features already used by pathologists to grade tumors, along with higher-level contextual, relational, and global image features.
Reporting in Science Translational Medicine, the Stanford University Department of Computer Science-led team that developed C-Path hopes the system will help to generate new insights into breast cancer progression and provide a more objective method for predicting patient prognosis. It could also potentially be used to stratify patients for clinical trials and identify features associated with drug response. Daphne Koller, Ph.D., and colleagues describe the platform in a paper titled “Systematic Analysis of Breast Cancer Morphology Uncovers Stromal Features Associated with Survival.”
Current methods for grading invasive breast cancer are based on the same three histological features—tubule formation, epithelial nuclear atypia, and epithelial mitotic activity—that were first reported back in 1928, the researchers explain. Indeed, while there is considerable research ongoing in the field of molecular profiling to assess prognosis and predict cancer response to therapy, microscopic image assessment is still the most commonly available, and financially feasible, method used worldwide.
Prognostic information can, however, also be derived from the molecular characteristics and morphological features of the cancer stroma, so the team set out to develop an image-based predictive system that could identify clinically predictive morphological features of breast cancer. Unlike previous work in the field of cancer morphometry, the aim was to develop a system that wasn’t limited to analyzing a predefined set of features used by pathologists. Rather, it would measure a more extensive, quantitative set of features from both the breast cancer epithelium and the stroma.
The Stanford team’s work to construct and evaluate the model used measurements on hematoxylin and eosin (H&E)-stained histological images from breast cancer tissue microarrays (TMAs) derived from patients in the Netherlands Cancer Institute (NKI) cohort. The image processing pipeline and prognostic model building procedure devised was effected in stages. A series of processing steps was used to separate the tissue from the background, partition the image into smaller regions known as superpixels, find nuclei within these superpixels, construct nuclear and cytoplasmic features, and compare measurements between superpixels.
The researchers initially had to manually label superpixels as epithelium or stroma, and apply a machine-learning technique that would effectively train the program to be able to automatically classify stroma or epithelium from large numbers of slides, based on 31 features. Constructing the final set of features used in the prognostic model involved recomputing the values of the basic measurements so they could be evaluated separately within epithelium and stroma, and subclassifying nuclei as typical or atypical.
Having learned how to classify stroma and epithelium, the system was then capable of taking measurements from contiguous epithelial and stromal regions, as well as from epithelial nuclei, epithelial atypical nuclei, epithelial cytoplasm, stromal round nuclei, stromal spindled nuclei, stromal matrix, and unclassified objects. Measurements of relational features that provided a more global view of the tissue included mean distance from epithelial nucleus to stromal nucleus, mean distance of atypical epithelial nucleus to typical epithelial nucleus, or distance between stromal regions. Overall, the system could take a set of measurements from 6,642 features per image.
They tested the resulting C-Path system on microscopic images from the NKI cohort which was used to train the classifier, and from the independent Vancouver General Hospital (VGH) cohort. In both cases the C-Path scores were highly predictive of survival, independent of other clinical or molecular factors, such as tumor grade, ER status, age, tumor size, lymph node status, mastectomy, chemotherapy, a 70-gene prognosis signature, hypoxia signature, wound response signature, genomic grade index, or intrinsic molecular sybtype. In fact, the authors remark, when the performance of the C-Path system was compared directly with pathological grading on the exact same set of images, it was far more accurate at predicting survival, particularly when there were multiple images available for individual cases.
Interestingly, when the team looked more closely at C-Path-measured features that were most clearly associated with survival, they found that seven of the top features were those that characterized the contextual relationships of epithelial and stromal objects to their neighbours. “Because cancer is a disease of abnormal tumor cell growth and abnormal cellular relationships between tumor cells and stroma (unlimited replicative potential, loss of growth inhibition between neighboring transformed cells, cancer cell invasion of neighboring tissue), it is perhaps not surprising that relational features form key prognostic factors in breast cancer,” they note.
Pathologists currently use only epithelial features in the standard grading scheme for breast cancer, but when the team tested the predictive accuracy of stromal features and epithelial features separately, they found that the model using only stromal features was highly associated with overall survival in the independent VGH dataset, and in fact showed a survival association that was similar to that of the full C-Path model. Moreover, the prognostic model built on just the three stromal features was a stronger predictor of patient outcome than one built from the eight top epithelial features.
More specifically, the stromal feature that was most highly associated with prognosis was a measure of the variability of the stromal matrix intensity differences with its neighbors, such that high values were associated with improved outcome. High scoring breast cancer tissue tended to contain larger contiguous regions of stroma separated from larger contiguous epithelial regions. “This pattern of cancer growth more closely approximates epithelial-stromal relationships observed in the normal breast.”
The authors admit there is some way to go before the C-Path system can be used in a clinical setting. Their system was developed on breast cancer TMA images, not the multiple, whole-slide images used in routine diagnostic pathology. Nevertheless, they point out, given the ability of the system to provide prognosis from a very small sample of tumor, the C-Path may prove useful for deriving prognostically important information from small tumor biopsy specimens. And training the system with a dataset of whole slide images might actually improve performance further.
However, applying the system to whole slide images will require either manual or automated identification of breast cancer, because while the TMA images only include breast cancer cells, the whole slide images typically contain regions of both cancer and normal surrounding breast cancer.
Another task will be to test the model on other, independent cohorts of breast cancer images from institutions which may have handled samples differently, the researchers point out. It may be necessary to train the epithelial-stromal classifier on a subset of images from a new institution, a process that may require labelling of 50–60 images.
If these factors can all be addressed, the team believes the C-Path system could have widespread utility for providing prognostic and disease-progression-based information on a range of cancer types. “We believe that the flexible architecture of the C-Path system—consisting of the construction of a comprehensive feature set within a machine learning framework—will enable the application of C-Path to build a library of image-based models in multiple cancer types, each optimized to predict a specific clinical outcome, including response to particular pharmacologic agents, thereby allowing this approach to be used to directly guide treatment decisions.”