Scientists at the Karolinska Institutet and University of Helsinki have devised a method for counting individual RNA and DNA molecules in complex mixtures. They claim the approach could theoretically improve the accuracy of just about any next-generation sequencing method, is amenable to amplification procedures, and doesn’t require detecting each original molecule or keeping track of the number of copies made.
Reported in Nature Methods, the technique involves making each original molecule in the sample "unique" using one of a range of approaches, such as adding a random DNA sequence label, or fragmentation, which generates a unique molecular identifier (UMI). According to Karolinska’s Jussi Taipale, M.D., and colleagues, as long as the complexity of the resulting library of molecules is then maintained, it can be amplified, normalized, or processed without losing information about the original molecule count: essentially, the number of UMIs in the library acts as a molecular memory of the number of molecules in the initial sample. Then, during sequencing, although each UMI will be detected numerous times, the number of original molecules can be calculated by counting each UMI just once.
In fact, the team suggests, increasingly accurate estimates of the absolute numbers of molecules can be made even well before all the UMIs are observed. And this contrasts with other counting methods such as direct single-molecule sequencing, which needs all counted molecules to be directly observed. The authors report their technique in a paper titled “Counting absolute numbers of molecules using unique molecular identifiers.”
The authors tested their UMI counting technique to digital karyotyping and mRNA sequencing (mRNA-seq). In the digital karyotyping experiment they mixed equal amounts of genomic DNA from a Down syndrome patient and his mother. The mixed DNA was fragmented into a library of molecules, and a sample containing less than a single genome copy. In combination with fragmentation, the use of a small aliquot reduces complexity, so that each molecule is expected to have unique ends, and the genomic position of either end can be used as a UMI, they explain. The DNA was then amplified. Total counts didn’t clearly show that half the sample was derived from DNA with trisomy 21 and a single copy of the X chromosome, whereas reanalyzing the sample by counting UMIs was much clearer. Indeed, the results indicated that the technique could be adapted for use in prenatal diagnostics.
In the second experiment the researchers tested a labeling protocol for counting mRNA molecules in S2 cells from Drosophila melanogaster. RNA was converted to cDNA, and an 10 base-pair random DNA label incorporated. The resulting cDNA fragments were then amplified directly and sequenced. “In this method, only one fragment is derived from each mRNA,” the authors explain. “The sequence of the label and the 5’ mapped position of the fragments together define the UMI.”
Counting the total reads after 15 or 25 PCR cycles resulted in marked loss of accuracy, in that over 400 of the 5,000 or so measured genes differed by more than 5% between the samples. However, using UMIs to estimate the number of cDNAs in the original sample provided a much higher correlation between samples, and only 10 genes differed between samples by more than 5%. Moreover, the bias resulting from the total read counting method could be identified using the UMI method.
“In principle, the method can be applied to count all types of molecules or particles such as proteins or viruses that can be stoichiometrically labeled with DNA and subsequently purified from free label,” the authors state. And in contrast to previous approaches, it can be applied to accurately estimate the number of molecules without needing to observe them all. The technique could also be modified, by using non-random UMI labels, to provide information about the relationships or interactions between molecules, such as fragments that were originally linked together as one macromolecular complex. “The UMI method and its variations are thus likely to improve a large number of next-generation sequencing–based molecule-counting applications and also enable new methods for tracking relationships between molecules,” the team concludes.