For Dr. Schadt, one of his biggest challenges is how to unify mathematically different modeling approaches to develop better models of disease. As he sees it, data-driven, hypothesis-free modeling approaches, i.e., structured learning approaches that assume we don’t know the rules of complex systems but must learn them from big data, are largely pursued independently of more hypothesis-driven modeling, i.e., approaches for which we assume we know the rules and how things are connected.
“We are seeking to integrate these approaches so that we get the best of both worlds while minimizing the weaknesses of each,” notes Dr. Schadt.
Dr. Schadt has some impressive resources available for his quest. The institute currently uses roughly 40,000 square feet, which includes a CLIA-certified sequencing core, a super computer (Minerva), wetlabs for sample preparation and running of molecular biology experiments, and dry lab space for different computational groups (statistical genetics, bioinformatics and sequence informatics, and network modeling/systems biology).
The institute also has some unique capabilities, including computing power that can manage petabytes [one quadrillion (1015) bytes] scales of data, and the presence of worldclass information technology personnel.
About half of the institute’s current faculty of 30 are experts in network modeling, predictive modeling, or machine learning. The other half is focused on sequence informatics, disease biology, and building interfaces. Dr. Schadt hopes to create the right ecosystem so that the diversity of talent across disciplines is all in the same space, learning and working with each other.
He’s still looking for additional talent. “We have staff scientist positions, faculty positions, post-doctoral positions, and we are recruiting students for our computational biology Ph.D. program.”
Interestingly, five years ago virtually none of these areas of expertise would have been required in any medical center. Today, they are essential to one of the institute’s key missions: handling large volumes of data, with the goal of developing information and creating understanding that can be translated rapidly into the clinical setting.