Accelerating minimap2 for long-read sequencing applications on modern CPUs (2024)

References

Chaisson, M. J. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1–16 (2019).
Article Google Scholar
Conesa, A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 1–19 (2016).
Google Scholar
Beyter, D. et al. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat. Genet. 53, 779–886 (2021).
Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021).
Article Google Scholar
De Coster, W., Weissensteiner, M. H. & Sedlazeck, F. J. Towards population-scale long-read sequencing. Nat. Rev. Genet. 22, 572–587 (2021).
PromethION Brochure (Nanophore Technologies, 2021); https://nanoporetech.com/sites/default/files/s3/literature/PromethION-brochure.pdf
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
See Also
Manual Page - minimap2(1)Minimap2: pairwise alignment for nucleotide sequences Minimap2 genome alignment tutorial New strategies to improve minimap2 alignment accuracy
Article Google Scholar
Guo, L., Lau, J., Ruan, Z., Wei, P. & Cong, J. Hardware acceleration of long read pairwise overlapping in genome sequencing: a race between FPGA and GPU. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines 127–135 (IEEE, 2019).
Zeni, A. et al. LOGAN: high-performance GPU-based X-drop long-read alignment. In 2020 IEEE International Parallel and Distributed Processing Symposium 462–471 (IEEE, 2020).
Feng, Z., Qiu, S., Wang, L. & Luo, Q. Accelerating long read alignment on three processors. In Proc. 48th International Conference on Parallel Processing 1–10 (ACM, 2019).
Roberts, M., Hayes, W., Hunt, B. R., Mount, S. M. & Yorke, J. A. Reducing storage requirements for biological sequence comparison. Bioinformatics 20, 3363–3369 (2004).
Article Google Scholar
Abouelhoda, M. I. & Ohlebusch, E. Chaining algorithms for multiple genome comparison. J. Discrete Algorithms 3, 321–341 (2005).
Article MathSciNet Google Scholar
Jain, C., Gibney, D. & Thankachan, S. V. Co-linear chaining with overlaps and gap costs. Preprint at https://www.biorxiv.org/content/10.1101/2021.02.03.429492v2 (2021).
Ho, D. et al. LISA: learned indexes for DNA sequence analysis. Preprint at https://arxiv.org/abs/1910.04728 (2020).
Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).
Article Google Scholar
Nurk, S., Koren, S., Rhie, A., Rautiainen, M. et al. The complete sequence of a human genome. Preprint at https://doi.org/10.1101/2021.05.26.445798 (2021).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Article Google Scholar
Payne, A. et al. Readfish enables targeted nanopore sequencing of gigabase-sized genomes. Nat. Biotechnol. 39, 442–450 (2021).
Article Google Scholar
Kovaka, S., Fan, Y., Ni, B., Timp, W. & Schatz, M. C. Targeted nanopore sequencing by real-time mapping of raw electrical signal with uncalled. Nat. Biotechnol. 39, 431–441 (2021).
Article Google Scholar
Zhang, H. et al. Real-time mapping of nanopore raw signals. Bioinformatics https://doi.org/10.1093/bioinformatics/btab264 (2021).
Jain, C., Rhie, A., Hansen, N., Koren, S. & Phillippy, A.M. A long read mapping method for highly repetitive reference sequences. Preprint at https://www.biorxiv.org/content/10.1101/2020.11.01.363887v1.full (2020).
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Article Google Scholar
Ren, J. & Chaisson, M. lRA: the long read aligner for sequences and contigs. Preprint at https://doi.org/10.1371/journal.pcbi.1009078 (2020).
Kraska, T., Beutel, A., Chi, E.H., Dean, J. & Polyzotis, N. The case for learned index structures. In ACM International Conference on Management of Data 489–504 (ACM, 2018).
Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R. & Kraska, T. FITing-Tree: a data-aware index structure. In SIGMOD ’19: Proceedings of the 2019 International Conference on Management of Data 1189–1206 (ACM, 2019); https://doi.org/10.1145/3299869.3319860
Ferragina, P. & Vinciguerra, G. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. PVLDB 13, 1162–1175 (2020).
Google Scholar
Ding, J. et al. ALEX: An Updatable Adaptive Learned Index. In SIGMOD ‘20: Proceedings of the 2020 International Conference on Management of Data 969-984 (ACM, 2020). https://doi.org/10.1145/3318464.3389711
Wu, Y., Yu, J., Tian, Y., Sidle, R. & Barber, R. Designing succinct secondary indexing mechanism by exploiting column correlations. In SIGMOD ’19: Proceedings of the 2019 International Conference on Management of Data 1223–1240 (ACM, 2019). https://doi.org/10.1145/3299869.3319861
Kirsche, M., Das, A. & Schatz, M. C. Sapling: accelerating suffix array queries with learned data models. Bioinformatics 37, 744–749 (2021).
Article Google Scholar
Marcus, R. et al. Benchmarking learned indexes. In PVLDB Vol. 14, 1–13 (2021).
Marcus, R., Zhang, E. & Kraska, T. CDFShop: exploring and optimizing learned index structures. In SIGMOD ’20: Proc. 2020 ACM SIGMOD International Conference on Management of Data 2789–2792 (ACM, 2020); https://doi.org/10.1145/3318464.3384706
Suzuki, H. & Kasahara, M. Introducing difference recurrence relations for faster semi-global alignment of long sequences. BMC Bioinformatics 19, 33–47 (2018).
Article Google Scholar
Cheng, H., Concepcion, G., Feng, X., Zhang, H. & Li, H. Human Assemblies Evaluated in the Hifiasm Paper (Zenodo, 2020); https://doi.org/10.5281/zenodo.4393631
Kalikar, S., Jain, C., Md, V. & Misra, S. mm2-fast Source Code Used in the Manuscript—Accelerating Minimap2 for Long-Read Sequencing Applications on Modern CPUs (Zenodo, 2022); https://doi.org/10.5281/zenodo.5888171
Kalikar, S., Jain, C., Md, V. & Misra, S. Scripts Used for the Experiments in the Manuscript—Accelerating Minimap2 for Long-Read Sequencing Applications on Modern CPUs (Zenodo, 2022); https://doi.org/10.5281/zenodo.5884451

Download references