Sponsored content brought to you by
Human disease is complex and influenced by a wide range of genomic variants. Sequencing accuracy is paramount in order to detect this vast array of genomic dysfunction.
In genomic and multiomic experiments, accuracy relies on the entire workflow. Sample collection and isolation, preservation, and prep along with sequencing are important steps. But to fully leverage these upfront efforts, a robust toolkit is needed on the backend to maximize data analysis and more fully capture meaningful data.
This crucial end step should be the starting point in initial design discussion to ensure statistically significant and meaningful results. Yet data analysis has traditionally been complex or overwhelming to researchers not trained in bioinformatics. New tools offering both well-established and highly-customizable workflows with a coding-free interface are now available to assist not only bioinformaticians but also researchers who require accessible ways to explore their data.*
What Is Sequencing Accuracy?
First, a sequencer has to provide correct read-outs. For example, the Q-score indicates the ability to accurately call a base. But instruments are only part of the equation. Without a good analysis pipeline, thousands of variants can be missed. The cost-reducing advancements of Illumina technology in multigenomes and genomic algorithms address longstanding issues of scalability, accuracy, and comprehensiveness of variant detection across many sizes and types of alleles.
Expanding the Diversity of the Reference Genome
Data accuracy is highly dependent on the diversity of the reference genome. The typical standard used for alignment represents a small, select audience. According to the National Human Genome Research Institute, 70% of the original reference human genome came from one person of blended ancestry, and the remaining 30% came from a combination of 19 other individuals of mostly European descent.1 Although regularly updated, ancestry bias persists in this reference.
To better represent sequence diversity among individuals throughout the entire human population, Illumina uses a set of prebuilt pangenome references from 128 samples or assemblies that cover 26 ancestries.
This multigenome mapper enables accurate mapping in highly polymorphic regions of the genome to deliver more precise secondary analysis solution. The option also exists to build custom pangenome references to better represent specific populations and further reduce ancestry bias.
Developing a Full Picture
In addition, DNA alone does not provide enough data to fully understand biology and disease. Modern analysis pipelines and tools are built to interpret multiple different omes, to combine and compare insights. Farshad Nassiri, MD, PhD, neurosurgeon and scientist, University of Toronto, emphasized this point and said, “You gain unique information from looking at different data types like the epigenome, genome, and transcriptome together. Combined data let you identify new information that you were not able to find using only one data type.”
For instance, Horton et al. demonstrated in a germline DNA and RNA sequencing study with 43,524 cohorts that paired DNA and RNA sequencing is associated with improved identification of individuals with a hereditary cancer predisposition.2 Another research group showed that the Tempus xT assay (DNA and whole-exome capture RNA NGS) increased ALK fusion detection by 18% over DNA alone for patients with advanced NSCLC (non-small cell lung cancer).3
Accessing Fast Analysis
The latest version of Dynamic Read Analysis for GENomics (DRAGEN™ v4.3) from Illumina brings new accessible bioinformatics capabilities to the genomics field. This analysis platform has the ability to detect the entire known landscape of variations.4 DRAGEN is provided with many Illumina sequencers and is also available as a standalone user interface–based platform on the cloud.
In about 30 minutes of computation time–from raw reads to variant detection–users can gain more insight from individual genomes 16 times faster than the commonly used, legacy BWA GATK method**.5 Scientists who pay per hour for server time, gain major time and cost savings to propel them faster and more accurately toward new discoveries in the quest to unravel disease.
References
1. Fact Sheet: Human Genome Project
2. Horton C et al. Diagnostic Outcomes of Concurrent DNA and RNA Sequencing in Individuals Undergoing Hereditary Cancer Testing. JAMA Oncol. 2024;10(2):212–219. doi:10.1001/jamaoncol.2023.5586
3. Iams W et al. 182P ALK fusion detection by RNA next-generation sequencing (NGS) compared to DNA in a large, real-world non-small cell lung cancer (NSCLC) dataset. Annals of Oncology Vol 34, Supplement 2 S254. 2023 doi: 10.1016/j.annonc.2023.09.2906
4. Behera S et al. Comprehensive and accurate genome analysis at scale using DRAGEN accelerated algorithms. bioRxiv [Preprint]. 2024 Jan 6:2024.01.02.573821. doi: 10.1101/2024.01.02.573821
5. Betschart, RO et al. Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment. Sci Rep 12, 21502 (2022). doi:10.1038/s41598-022-26181-3.
To learn more about the newest apps and tools to analyze DNA and RNA, download our NEW ebook on multiomics data analysis workflows: ilmn.ly/informatics-ebook.
* For Research Purpose Only. Not for use in diagnostic procedures.
** BWA GATK 4.1.4.0 runtime on a local 2x Intel Xeon Gold 6126 (48 threads) with 394 GB RAM and 2TB NVME SSD using BCBIO for parallelization.
M-AMR-01549