SHARK: Web server for alignment-free homology assessment for intrinsically disordered and unalignable protein regions

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Chi Fung Willis Chow - , Max Planck Institute of Molecular Cell Biology and Genetics, Center for Systems Biology Dresden (CSBD), Clusters of Excellence PoL: Physics of Life (Author)
  • Maxim Scheremetjew - , Max Planck Institute of Molecular Cell Biology and Genetics, Center for Systems Biology Dresden (CSBD) (Author)
  • Hong Kee Moon - , Max Planck Institute of Molecular Cell Biology and Genetics, Center for Systems Biology Dresden (CSBD) (Author)
  • Soumyadeep Ghosh - , Max Planck Institute of Molecular Cell Biology and Genetics, Center for Systems Biology Dresden (CSBD) (Author)
  • Anna Hadarovich - , Max Planck Institute of Molecular Cell Biology and Genetics, Center for Systems Biology Dresden (CSBD) (Author)
  • Lena Hersemann - , Max Planck Institute of Molecular Cell Biology and Genetics, Center for Systems Biology Dresden (CSBD) (Author)
  • Agnes Toth-Petroczy - , Max Planck Institute of Molecular Cell Biology and Genetics, Center for Systems Biology Dresden (CSBD), Clusters of Excellence PoL: Physics of Life (Author)

Abstract

Whereas alignment has been fundamental to sequence-based assessments of protein homology, it is ineffective for intrinsically disordered regions (IDRs) due to their lowered sequence conservation and unique sequence properties. Here, we present a web server implementation of SHARK (bio-shark.org), an alignment-free algorithm for homology classification that compares the overall amino acid composition and short regions (k-mers) shared between sequences (SHARK-scores). The output of such k-mer-based comparisons is used by SHARK-dive, a machine learning classifier to detect homology between unalignable, disordered sequences. SHARK-web provides sequence-versus-database assessment of protein sequence homology akin to conventional tools such as BLAST and HMMER. Additionally, we provide precomputed sets of IDR sequences from 16 model organism proteomes facilitating searches against species-specific IDR-omes. SHARK-dive offers superior overall homology detection performance to BLAST and HMMER, driven by a large increase in sensitivity to low sequence identity homologs, and can be used to facilitate the study of sequence-function relationships in disordered, difficult-to-align regions.

Details

Original languageEnglish
Pages (from-to)W512-W519
JournalNucleic acids research
Volume53
Issue numberW1
Publication statusPublished - 7 Jul 2025
Peer-reviewedYes

External IDs

PubMed 40396357

Keywords

ASJC Scopus subject areas