The Needs of Automated Algorithms for Accelerating Clinical Sequencing Data Analytics

TAIPEI, TAIWAN, Nov.1st, 2021 - In two decades ago, scientists completed the first draft of the human genome and the promise of utilizing genomics for daily diagnosis and treatment is still in progress. Along with the cost reduction of DNA Sequencing, nowadays more and more hospitals are adopting clinical Whole Exome Sequencing (WES) or Whole Genome Sequencing (WGS) services instead of targeted sequencing to ensure the diagnoses of patients who might have genetic disorders.

Due to the high price of WES in the past, limited targeted sequencing tests were more acceptable to clinical diagnoses. Targeted sequencing was designed to detect specific variants in limited coding regions. It is difficult to find unknown variants which might be the cause of the genetic disorders. Newer technology allows for assessment of exomes or entire genomes and can detect millions of genetic variants. To ensure the accuracy and the speed up and the cost reduction of expert analysis, it requires automated algorithms to parse through raw data to help distinguish true variants from those caused by systematic errors.

The Burrow-Wheeler Aligner (BWA-MEM) for sequence alignment and Genome Analysis Tool Kits (GATK) for gene variant calling are most recommended software tools that require automated algorithms as standard quality measurements for DNA short reads analysis. BWA-MEM is no doubt one of the popular tools for sequence alignment. Being the first step to process the millions of sequence raw data, BWA-MEM generates alignments to a reference genome for a variety of germline and somatic variant detections. The GATK is for identifying single nucleotide variants (SNVs) and short insertion-deletions (indels) in variety fields, such as germline/somatic short variant calling, and copy number variant discovery. With the technological evolution and the substantial reduction of cost for genomics sequencing, more and more human genome sequences have been collected and analyzed to help clinical diagnoses. Scientists need to be able to process these reads more efficiently and cost effectively.

Especially BWA-MEM is a memory-bound computational application and in traditional view cache miss is one of the most serious issues. With the increase in the number of cores and threads on HPC platforms, the problem of low parallelism and thread scalability becomes more serious and deeply impairs the computational performance. Despite extensive optimization efforts, the thread scalability and parallelism of BWA-MEM are still not very efficient since the majority of the optimization works didn't take advantage of the characteristics of computer hardware architecture.

WASAI Lightning offers high-performance BWA-MEM and GATK analysis which is deployed with datacenter scale FPGA cards from Intel and Xillinx, composed of memory-bound and computing-bound acceleration technologies. It offers the same usage and command line interface (CLI) as native BWA-MEM does to make the users’ migration seamless. WASAI Lighting platform is fully integrated with server grade CPUs and FPGAs, such as Intel Xeon Scalable Processors, AMD EPYC CPU, Intel FPGA Programmable Acceleration Cards and Xilinx FPGA Alveo Accelerator Cards. It is implemented with the state-of-the-art technology of hierarchical memory management architecture and instructional pipeline system to conquer the issues with high-frequency and high variety of genomic big data access. Consequently, we are able to provide a highly accurate computational genomics analysis solution with significantly decreased execution time.

