If you want to understand how a machine works, you need more than a parts list. You need to see where all the pieces are located and how they fit together, especially if you’re trying to understand a complicated machine—a human cell, for instance. Well, for some time now, we’ve had detailed, protein-level parts lists for human cells. These parts lists, or proteomic profiles, have their uses, but they offer scant guidance when we try to follow the mass, energy, and information transfers that underlie living processes.

To effectively create technical drawings or illustrations of the cell, scientists based at the Chan Zuckerberg Biohub (CZ Biohub) combined endogenous tagging, live-cell imaging, and interaction proteomics. This three-pronged approach allowed the scientists, in their words, to “image the localization of each protein in live cells, as well as the interactions between a given target and other proteins within the cell.” The scientists, who were led by Manuel D. Leonetti, PhD, asserted that their work will facilitate a systems-level description of the organization of the human proteome.

Details of the work appeared March 10 in the journal Science, in an article titled, “OpenCell: Endogenous tagging for the cartography of human cellular organization.”

“Using high-throughput CRISPR-mediated genome editing, we constructed a library of 1,310 fluorescently tagged [HEK293T] cell lines,” the article’s authors wrote. “By performing paired [immunopurification–mass spectrometry (IP-MS)] and live-cell imaging using this library, we generated a large dataset that maps the cellular localization and physical interactions of the corresponding 1,310 proteins. Applying a combination of unsupervised clustering and machine learning for image analysis allowed us to objectively identify proteins that share spatial or interaction signatures.”

Besides introducing an integrated experimental pipeline for high-throughput cell biology, the scientists inaugurated OpenCell, an open-source collection of protein localization and interaction measurements. The collection, which includes measurements from the current study, is easily accessible through an interactive web interface at opencell.czbiohub.org.

The scientists also described how they approached image analysis. They combined unsupervised clustering and machine learning to generate insights into the function of individual proteins, and to derive some general principles of human cellular organization.

“In particular, we show that proteins that bind RNA form a separate subgroup defined by specific localization and interaction signatures,” the scientists noted. “We also show that the precise spatial distribution of a given protein is very strongly correlated with its cellular function, such that fine-grained molecular insights can be derived from the analysis of imaging data.”

Methodology for the OpenCell library. (A) Functional tagging with split-mNeonGreen2. (B) Endogenous tagging strategy. (C) Experimental pipeline. (D) Detection of fluorescence. (E) Data analysis. [CZ Biohub, bioRxiv, CC-BY-NC-ND 4.0]
The scientists acknowledged that using endogenous fluorescent tags has certain limitations. For example, the tags are about as large as an average human protein, so their insertion can alter a target protein’s expression, localization, function, or degradation rate. Also, tagging may not allow protein isoforms (including post-translationally modified variants) to be discriminated. Finally, endogenous tagging may miss low-abundance proteins.

“Overall,” the scientists stated, “the full description of human cellular architecture remains a formidable challenge that will require complementary methods being applied in parallel.” A similar point was made in a review (“Subcellular Transcriptomics and Proteomics: A Comparative Methods Review”) that was prepared by University of Cambridge scientists and published in Molecular & Cellular Proteomics.

“Several options [are] available to researchers to address biological questions concerning the subcellular localization and trafficking of proteins and transcripts,” the reviewers noted. “However, the technical challenges can still be vast and differ between transcriptomics and proteomics, as well as the biological system and question in hand, which is the intrinsic reason why there is lack of a one-size-fits-all approach.”

Nonetheless, the reviewers also sounded an optimistic note: “Coupling -omics with localization studies is still largely in its infancy but is rapidly growing because of advancement of sample preparation strategies and equipment reaching a pinnacle with single-molecule tracking, sequencing, and current MS technology. Not only have subcellular -omics technologies aided our insight into global spatial organization (e.g., HPA Cell Atlas), biological processes (e.g., cell cycle and embryonic development), and pathologies (e.g., cancer biology) but are also emerging in diagnostic applications for patients.”

For its part, the CZ Biohub team is confident that its methodology can identify “complex but deterministic signatures from light microscopy images,” opening “exciting avenues for deep phenotyping and functional genomics.” The team added that because light microscopy is easily scalable, can be performed live, and enables measurements at the single-cell level, the methodology should “offer rich opportunities for the full quantitative description of cellular diversity in normal physiology and disease.”

A brief Q&A with Dr. Leonetti follows:

Is CZ Biohub working to expand the number of identified proteins beyond the 1,300 cited in the current article?

Yes. So far OpenCell covers about 7% of the whole proteome for example. We are revamping and automating our experimental pipelines using robotics and software to increase our throughput. Our goal is to significantly expand the coverage of the proteome. An exciting avenue would be to probe proteins in different cell types (not all proteins are expressed in all cells). We can do this by first building tagged libraries in stem cells, which can then be differentiated into different cell types.

What advantages/disadvantages does the CZ Biohub’s approach have with respect to other spatial biology approaches? How does it complement other approaches?

One important advantage of our microscopy approach is that we can profile cells that are alive. That will be key to unlocking cellular dynamics. One disadvantage is that because we rely on endogenous expression, proteins expressed at a low copy number are hard to detect (we can solve this by engineering brighter fluorescent probes). Another disadvantage is that because we modify genes, all protein isoforms expressed from the same genes cannot easily be distinguished. For example, post-translationally modified proteins cannot be tracked. Immunofluorescence is a great complementary tool for this.

Our image analysis work shows that there is a lot of specific, fine-grained information that can be extracted in images of protein localization alone. There is a lot of interest in the field to see how much of the information that we typically extract from transcriptomics could be extracted directly from images (cell types, cell states, etc.). It is still the early days, but there are papers coming out in the literature that suggest that the answer is going to be: quite a bit of information actually!

Besides the spatial dimension, does CZ Biohub’s approach also take in the temporal dimension?

Absolutely—this is an immediate focus of ours. The development of high-throughput light-sheet microscopes (in which the Biohub community is very active—see, for example, PMID 31061492) will be very enabling for that endeavor.