Using structural motif descriptors for sequence-based binding site prediction

Publikation: Beitrag in FachzeitschriftForschungsartikelBeigetragenBegutachtung

Beitragende

Abstract

BACKGROUND: Many protein sequences are still poorly annotated. Functional characterization of a protein is often improved by the identification of its interaction partners. Here, we aim to predict protein-protein interactions (PPI) and protein-ligand interactions (PLI) on sequence level using 3D information. To this end, we use machine learning to compile sequential segments that constitute structural features of an interaction site into one profile Hidden Markov Model descriptor. The resulting collection of descriptors can be used to screen sequence databases in order to predict functional sites.

RESULTS: We generate descriptors for 740 classified types of protein-protein binding sites and for more than 3,000 protein-ligand binding sites. Cross validation reveals that two thirds of the PPI descriptors are sufficiently conserved and significant enough to be used for binding site recognition. We further validate 230 PPIs that were extracted from the literature, where we additionally identify the interface residues. Finally we test ligand-binding descriptors for the case of ATP. From sequences with Swiss-Prot annotation "ATP-binding", we achieve a recall of 25% with a precision of 89%, whereas Prosite's P-loop motif recognizes an equal amount of hits at the expense of a much higher number of false positives (precision: 57%). Our method yields 771 hits with a precision of 96% that were not previously picked up by any Prosite-pattern.

CONCLUSION: The automatically generated descriptors are a useful complement to known Prosite/InterPro motifs. They serve to predict protein-protein as well as protein-ligand interactions along with their binding site residues for proteins where merely sequence information is available.

Details

OriginalspracheEnglisch
Seiten (von - bis)S5
FachzeitschriftBMC bioinformatics
Jahrgang2007
Ausgabenummer8 Suppl 4
PublikationsstatusVeröffentlicht - 22 Mai 2007
Peer-Review-StatusJa

Externe IDs

PubMedCentral PMC1892084
Scopus 34447617178
ORCID /0000-0003-2848-6949/work/141543402

Schlagworte

Schlagwörter

  • Amino Acid Motifs, Amino Acid Sequence, Binding Sites, Molecular Sequence Data, Protein Binding, Protein Interaction Mapping/methods, Proteins/chemistry, Sequence Alignment/methods, Sequence Analysis, Protein/methods

Bibliotheksschlagworte