Hierarchical Algorithm for Identifying Semi-Rigid Domains in Proteins

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Kemian tekniikan korkeakoulu | Bachelor's thesis

Department

Mcode

CHEM3054

Language

en

Pages

50+5

Series

Abstract

Understanding protein dynamics is crucial for unraveling biological functions and mechanisms. This thesis presents a novel algorithm designed to identify semi-rigid domains and super-domains within proteins using molecular dynamics (MD) simulation data. The algorithm combines a greedy approach for domain construction with a divide-and-conquer strategy for super-domain identification. A key innovation is the data-driven parameter determination method, utilizing Gaussian mixture models (GMMs) to select optimal thresholds from input data, addressing the challenge of inconsistent cut-off values in existing quasi-rigid domain identification algorithms. The algorithm was tested on three proteins of increasing complexity: Glycophorin A, β2-adrenergic receptor, and a viral capsid protein. Results demonstrate its ability to accurately identify biologically relevant semi-rigid regions, as validated against MD trajectories. For instance, rigid α-helices were correctly distinguished from flexible loops, while the hierarchical design of the algorithm enabled efficient processing of larger systems. The results highlight the sensitivity of threshold values to GMM configurations, indicating the need for automated parameter selection. This work extends beyond protein analysis, offering potential application in dimensionality reduction, featurization for machine learning (ML) analysis, and object identification in engineered systems. By addressing limitations of existing algorithms and proposing robust solutions, this study provides a step forward in computational biophysics.

Description

Supervisor

Hummel, Michael

Thesis advisor

Kulig, Waldemar

Other note

Citation