Protein expression levels play a crucial role in shaping biological phenotypes. Given the sexual dimorphism between genetic females (XX) and genetic males (XY), significant differences in protein expression are expected but remain under explored.
This thesis, as part of a larger collaborative project, aims to replicate sex-specific proteomic patterns identified in the UK Biobank (UKB) and evaluate the transferability of a proteomic sex prediction model. It explores the consistency of these findings across diverse ethnic groups and investigates age-related changes in sex-bias estimates and ProtSexIndex (the deviation between genetic sex and proteomic sex) using the longitudinal data from the MESA cohort.
Extensive preprocessing and quality control steps were undertaken to ensure robust and reliable results. Building on logistic regression methods to identify protein sex-bias in UKB and an XGBoost model trained to estimate proteomic sex, this thesis evaluates the model’s transferability through two Reciprocal Generalizability Tests and extends the analysis to explore cohort-specific differences and temporal dynamics.
The UKB-trained model achieved high performance in MESA (AUC = 0.9938, 95% CI [0.9903, 0.9967]). Despite this, cohort-specific differences inherent to the relative nature of NPX values introduced calibration challenges. Longitudinal analyses demonstrated stable sex-bias estimates across exams (Spearman Correlation > 0.95, p < 1e-15), accompanied by a significant reduction in the magnitude of sex differences with aging (923 out of 2917 proteins, z-test). ProtSexIndex increased significantly with age for both males (t-statistic = -3.75, p = 0.0002) and females (t-statistic = -6.38, p < 1e-9), reflecting the dynamic nature of proteomic sex. Multi-ethnicity analyses highlighted consistent sex-bias estimates and robust model performance across racial groups, underscoring the generalizability of findings.