February 22, 2022

Accurate long-read de novo assembly evaluation with Inspector

Written by

featured discovery

Zechen Chong, M.D.Zechen Chong, Ph.D., assistant professor in the Department of Genetics, is the latest winner of the Heersink School of Medicine’s Featured Discovery. This initiative celebrates important research from Heersink faculty members.

Chong’s study, “Accurate long-read de novo assembly evaluation with Inspector,” was published in Genome Biology.

Chong explains that long read sequencing techniques are revolutionizing genomics research and have great potential for characterizing the full spectrums of genomic variations (mutations), which contribute to a wide variety of phenotypes, including different types of human diseases.

“Long-read de novo assembly is currently a concrete approach toward achieving this goal,” said Chong. “However, long reads have higher error rates, and there is a lack of effective tools for accurately evaluating the assembly results, especially identifying structural errors. Therefore, we developed Inspector to faithfully report long-read de novo assembly errors. Notably, Inspector can correct the assembly errors that are detrimental to downstream analysis, such as genotype-phenotype association analysis.”

Maggi Chen, a graduate student at UAB, is the lead author on this study.

“When developing Inspector, I wanted to provide our community with a reliable tool to accurately assess their assembly results,” said Chen. “It is challenging to identify assembly errors in the repetitive regions of the human genome. But with help of Dr. Chong and our co-authors, I successfully addressed all of the major issues and revealed some limitations of current long-read assemblers.”

This work was supported by a grant from the National Institute of General Medical Sciences, the BioData Catalyst Fellowship from National Heart, Lung, and Blood Institute, and the Center for Clinical and Translational Science grant from the National Center for Advancing Translational Sciences.

The Heersink communications staff sat down with Dr. Chong to gain insights about the research in this study, UAB, and the science community.

Q: What compelled you to pursue this research?

Chong: I have been working in the field of genomics and bioinformatics for 15 years. My focus has been genome assembly and variation characterization, two fundamental technical questions that are critical for advancing biomedical research. Short-read sequencing platforms have been successfully and widely used for large consortium projects with national and international efforts (e.g., 1000 Genomes Project, The Cancer Genome Atlas (TCGA), and the International Cancer Genome Consortium (ICGC) and small projects in almost every genetic research lab. However, short reads cannot achieve the goals of reconstructing complete genomes and depicting a comprehensive variation map of individual genomes. This is because short reads (100-150bp) are too short to unambiguously reconstruct original genomes or to map to the reference genome due to pervasive repetitive sequences, high levels of heterozygosity, and a large number and different types and sizes of variations. Long reads (10-30 kbp) can alleviate these issues and thus are promising in a variety of applications. For example, the Human Genome Structural Variation Consortium (HGSVC) identified a 7-fold increase in structural variations using long reads compared to traditional short sequencing reads. Our lab is focused on algorithm development based on long-read sequencing data and relevant applications.

Q: How do you feel your research will impact the science community?

Chong: Long-read de novo assembly is an approach that is widely used in both basic and clinical research. Researchers are trying to reconstruct the original genomes of different species and to characterize all types of variations, including disease-causing mutations. Genome assembly previously relied on an existing reference genome to evaluate the assembly results. A major challenge for reference-based analysis is distinguishing true variations from assembly errors. Inspector is the first tool to facilitate the discovery of long-read assembly errors, including both small- and large-scale errors. Inspector will help improve the assembly quality. Accurate assembly results are the basis for variants discovery, genome annotation, and subsequent functional studies.

Headshot Maggi Chen squareMaggi Chen, graduate student and lead author

Q: What is your research’s relevance to human disease?

Chong: For both rare and common diseases, genomic variants, including single nucleotide variations (SNVs), small insertions and deletions (indels), structural variations (SVs), and complex forms of these variants, if not deterministic, are playing an extremely important role. Our research is focused on accurate characterization of all forms of variations and understanding their functional impacts. In the past, it was challenging to achieve this goal due to the limitations of current methods. Thanks to the advance of current long-read sequencing techniques and improved bioinformatics algorithms, we are much closer to this possibility. Inspector, along with other tools developed or under development in our lab will pave a path towards this goal. 

Q: How has being at UAB and living in Birmingham affected your research?

Chong: UAB and Heersink School of Medicine have provided a superb environment for us to conduct our innovative research. Our High-Performance Computing (HPC) server, Cheaha, has provided powerful computational resources that have been essential for accomplishing this and many other projects. Informatics Institute and Department of Genetics have provided tremendous support for grant applications and other administrative needs. CCTS and OHDRC have also provided exceptional training and research support for my career development.

Q: What do you find makes the science community here unique?

Chong: UAB Heersink has diverse and outstanding research teams and fosters interdisciplinary collaboration. As head of a bioinformatics research lab, I have been privileged to develop an extensive network with outstanding collaborators for conducting cutting-edge research. For example, this project has been an interdisciplinary team effort. In addition to the hard work of my Ph.D. student, first-author Yu (Maggi) Chen, master’s student Yixin Zhang from the Computer Science Department and Drs. Amy Wang and Min Gao from the Department of Medicine and the Informatics Institute have also contributed significantly to make our project a great success.