Accelerating minimap2 for long-read sequencing applications on modern CPUs (2024)

References

  1. Chaisson, M. J. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1–16 (2019).

    Article Google Scholar

  2. Conesa, A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 1–19 (2016).

    Google Scholar

  3. Beyter, D. et al. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat. Genet. 53, 779–886 (2021).

  4. Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021).

    Article Google Scholar

  5. De Coster, W., Weissensteiner, M. H. & Sedlazeck, F. J. Towards population-scale long-read sequencing. Nat. Rev. Genet. 22, 572–587 (2021).

  6. PromethION Brochure (Nanophore Technologies, 2021); https://nanoporetech.com/sites/default/files/s3/literature/PromethION-brochure.pdf

  7. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article Google Scholar

  8. Guo, L., Lau, J., Ruan, Z., Wei, P. & Cong, J. Hardware acceleration of long read pairwise overlapping in genome sequencing: a race between FPGA and GPU. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines 127–135 (IEEE, 2019).

  9. Zeni, A. et al. LOGAN: high-performance GPU-based X-drop long-read alignment. In 2020 IEEE International Parallel and Distributed Processing Symposium 462–471 (IEEE, 2020).

  10. Feng, Z., Qiu, S., Wang, L. & Luo, Q. Accelerating long read alignment on three processors. In Proc. 48th International Conference on Parallel Processing 1–10 (ACM, 2019).

  11. Roberts, M., Hayes, W., Hunt, B. R., Mount, S. M. & Yorke, J. A. Reducing storage requirements for biological sequence comparison. Bioinformatics 20, 3363–3369 (2004).

    Article Google Scholar

  12. Abouelhoda, M. I. & Ohlebusch, E. Chaining algorithms for multiple genome comparison. J. Discrete Algorithms 3, 321–341 (2005).

    Article MathSciNet Google Scholar

  13. Jain, C., Gibney, D. & Thankachan, S. V. Co-linear chaining with overlaps and gap costs. Preprint at https://www.biorxiv.org/content/10.1101/2021.02.03.429492v2 (2021).

  14. Ho, D. et al. LISA: learned indexes for DNA sequence analysis. Preprint at https://arxiv.org/abs/1910.04728 (2020).

  15. Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).

    Article Google Scholar

  16. Nurk, S., Koren, S., Rhie, A., Rautiainen, M. et al. The complete sequence of a human genome. Preprint at https://doi.org/10.1101/2021.05.26.445798 (2021).

  17. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).

    Article Google Scholar

  18. Payne, A. et al. Readfish enables targeted nanopore sequencing of gigabase-sized genomes. Nat. Biotechnol. 39, 442–450 (2021).

    Article Google Scholar

  19. Kovaka, S., Fan, Y., Ni, B., Timp, W. & Schatz, M. C. Targeted nanopore sequencing by real-time mapping of raw electrical signal with uncalled. Nat. Biotechnol. 39, 431–441 (2021).

    Article Google Scholar

  20. Zhang, H. et al. Real-time mapping of nanopore raw signals. Bioinformatics https://doi.org/10.1093/bioinformatics/btab264 (2021).

  21. Jain, C., Rhie, A., Hansen, N., Koren, S. & Phillippy, A.M. A long read mapping method for highly repetitive reference sequences. Preprint at https://www.biorxiv.org/content/10.1101/2020.11.01.363887v1.full (2020).

  22. Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).

    Article Google Scholar

  23. Ren, J. & Chaisson, M. lRA: the long read aligner for sequences and contigs. Preprint at https://doi.org/10.1371/journal.pcbi.1009078 (2020).

  24. Kraska, T., Beutel, A., Chi, E.H., Dean, J. & Polyzotis, N. The case for learned index structures. In ACM International Conference on Management of Data 489–504 (ACM, 2018).

  25. Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R. & Kraska, T. FITing-Tree: a data-aware index structure. In SIGMOD ’19: Proceedings of the 2019 International Conference on Management of Data 1189–1206 (ACM, 2019); https://doi.org/10.1145/3299869.3319860

  26. Ferragina, P. & Vinciguerra, G. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. PVLDB 13, 1162–1175 (2020).

    Google Scholar

  27. Ding, J. et al. ALEX: An Updatable Adaptive Learned Index. In SIGMOD ‘20: Proceedings of the 2020 International Conference on Management of Data 969-984 (ACM, 2020). https://doi.org/10.1145/3318464.3389711

  28. Wu, Y., Yu, J., Tian, Y., Sidle, R. & Barber, R. Designing succinct secondary indexing mechanism by exploiting column correlations. In SIGMOD ’19: Proceedings of the 2019 International Conference on Management of Data 1223–1240 (ACM, 2019). https://doi.org/10.1145/3299869.3319861

  29. Kirsche, M., Das, A. & Schatz, M. C. Sapling: accelerating suffix array queries with learned data models. Bioinformatics 37, 744–749 (2021).

    Article Google Scholar

  30. Marcus, R. et al. Benchmarking learned indexes. In PVLDB Vol. 14, 1–13 (2021).

  31. Marcus, R., Zhang, E. & Kraska, T. CDFShop: exploring and optimizing learned index structures. In SIGMOD ’20: Proc. 2020 ACM SIGMOD International Conference on Management of Data 2789–2792 (ACM, 2020); https://doi.org/10.1145/3318464.3384706

  32. Suzuki, H. & Kasahara, M. Introducing difference recurrence relations for faster semi-global alignment of long sequences. BMC Bioinformatics 19, 33–47 (2018).

    Article Google Scholar

  33. Cheng, H., Concepcion, G., Feng, X., Zhang, H. & Li, H. Human Assemblies Evaluated in the Hifiasm Paper (Zenodo, 2020); https://doi.org/10.5281/zenodo.4393631

  34. Kalikar, S., Jain, C., Md, V. & Misra, S. mm2-fast Source Code Used in the Manuscript—Accelerating Minimap2 for Long-Read Sequencing Applications on Modern CPUs (Zenodo, 2022); https://doi.org/10.5281/zenodo.5888171

  35. Kalikar, S., Jain, C., Md, V. & Misra, S. Scripts Used for the Experiments in the Manuscript—Accelerating Minimap2 for Long-Read Sequencing Applications on Modern CPUs (Zenodo, 2022); https://doi.org/10.5281/zenodo.5884451

Download references

Accelerating minimap2 for long-read sequencing applications on modern CPUs (2024)
Top Articles
Latest Posts
Article information

Author: Nathanael Baumbach

Last Updated:

Views: 6389

Rating: 4.4 / 5 (75 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Nathanael Baumbach

Birthday: 1998-12-02

Address: Apt. 829 751 Glover View, West Orlando, IN 22436

Phone: +901025288581

Job: Internal IT Coordinator

Hobby: Gunsmithing, Motor sports, Flying, Skiing, Hooping, Lego building, Ice skating

Introduction: My name is Nathanael Baumbach, I am a fantastic, nice, victorious, brave, healthy, cute, glorious person who loves writing and wants to share my knowledge and understanding with you.