Extreme-Scale Many-against-Many Protein Similarity Search

Type: Article

Publication Date: 2022-11-01

Citations: 4

DOI: https://doi.org/10.1109/sc41404.2022.00006

Download PDF

Abstract

Similarity search is one of the most fundamental computations that are regularly performed on ever-increasing protein datasets. Scalability is of paramount importance for uncovering novel phenomena that occur at very large scales. We unleash the power of over 20,000 GPUs on the Summit system to perform all-vs-all protein similarity search on one of the largest publicly available datasets with 405 million proteins, in less than 3.5 hours, cutting the time-to-solution for many use cases from weeks. The variability of protein sequence lengths, as well as the sparsity of the space of pairwise comparisons, make this a challenging problem in distributed memory. Due to the need to construct and maintain a data structure holding indices to all other sequences, this application has a huge memory footprint that makes it hard to scale the problem sizes. We overcome this memory limitation by innovative matrix-based blocking techniques, without introducing additional load imbalance.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Extreme-scale many-against-many protein similarity search 2023 Oğuz Selvitopi
Saliya Ekanayake
Giulia Guidi
Muaaz Gul Awan
Georgios A. Pavlopoulos
Ariful Azad
Nikos C. Kyrpides
Leonid Oliker
Katherine Yelick
Aydın Buluç
+ Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices 2020 Oğuz Selvitopi
Saliya Ekanayake
Giulia Guidi
Georgios A. Pavlopoulos
Ariful Azad
Aydın Buluç
+ Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices. 2020 Oğuz Selvitopi
Saliya Ekanayake
Giulia Guidi
Georgios A. Pavlopoulos
Ariful Azad
Aydın Buluç
+ PDF Chat Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices 2020 Oğuz Selvitopi
Saliya Ekanayake
Giulia Guidi
Georgios A. Pavlopoulos
Ariful Azad
Aydın Buluç
+ PDF Chat Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale 2021 Md Taufique Hussain
Oğuz Selvitopi
Aydın Buluç
Ariful Azad
+ Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale. 2020 Taufique Hussain
Oğuz Selvitopi
Aydın Buluç
Ariful Azad
+ Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale 2020 Md Taufique Hussain
Oğuz Selvitopi
Aydın Buluç
Ariful Azad
+ Space Efficient Sequence Alignment for SRAM-Based Computing: X-Drop on the Graphcore IPU 2023 Luk Burchard
Max Zhao
Johannes Langguth
Aydın Buluç
Giulia Guidi
+ Parallel and Scalable Precise Clustering for Homologous Protein Discovery 2019 Stuart Byma
Akash Dhasade
Adrian Altenhoff
Christophe Dessimoz
James R. Larus
+ PDF Chat Lightning-fast adaptive immune receptor similarity search by symmetric deletion lookup 2024 Touchchai Chotisorayuth
Andreas Tiffeau-Mayer
+ LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment 2020 Alberto Zeni
Giulia Guidi
Marquita Ellis
Nan Ding
Marco D. Santambrogio
Steven Hofmeyr
Aydın Buluç
Leonid Oliker
Katherine Yelick
+ LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment 2020 Alberto Zeni
Giulia Guidi
Marquita Ellis
Nan Ding
Marco D. Santambrogio
Steven Hofmeyr
Aydın Buluç
Leonid Oliker
Katherine Yelick
+ PDF Chat LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment 2020 Alberto Zeni
Giulia Guidi
Marquita Ellis
Nan Ding
Marco D. Santambrogio
Steven Hofmeyr
Aydın Buluç
Leonid Oliker
Katherine Yelick
+ Essential guidelines for computational method benchmarking 2018 Lukas M. Weber
Wouter Saelens
Robrecht Cannoodt
Charlotte Soneson
Alexander Hapfelmeier
Paul P. Gardner
Anne‐Laure Boulesteix
Yvan Saeys
Mark D. Robinson
+ Essential guidelines for computational method benchmarking 2018 Lukas M. Weber
Wouter Saelens
Robrecht Cannoodt
Charlotte Soneson
Alexander Hapfelmeier
Paul P. Gardner
Anne‐Laure Boulesteix
Yvan Saeys
Mark D. Robinson
+ PDF Chat Essential guidelines for computational method benchmarking 2019 Lukas M. Weber
Wouter Saelens
Robrecht Cannoodt
Charlotte Soneson
Alexander Hapfelmeier
Paul P. Gardner
Anne‐Laure Boulesteix
Yvan Saeys
Mark D. Robinson
+ PDF Chat SALoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs 2022 Seongyeon Park
Hajin Kim
Tanveer Ahmad
Nauman Ahmed
Zaid Al-Ars
H. Peter Hofstee
Youngsok Kim
Jinho Lee
+ SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs 2023 Seongyeon Park
Hajin Kim
Tanveer Ahmad
Nauman Ahmed
Zaid Al-Ars
H. Peter Hofstee
Youngsok Kim
Jinho Lee
+ PDF Chat High-Performance Cloud Computing for Exhaustive Protein–Protein Docking 2021 Masahito Ohue
Kento Aoyama
Yutaka Akiyama
+ PDF Chat Comparing Performance and Portability Between CUDA and SYCL for Protein Database Search on NVIDIA, AMD, and Intel GPUs 2023 Manuel Costanzo
Enzo Rucci
Carlos García
Marcelo Naiouf
Manuel Prieto

Works That Cite This (0)

Action Title Year Authors