Browsing by Author "Marttinen, Pekka, Prof., Aalto University, Department of Computer Science, Finland"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
- Efficient and Robust Algorithms for Extreme Multilabel Classification
School of Science | Doctoral dissertation (article-based)(2024) Qaraei, MohammadrezaExtreme Multilabel Classification (XMC) refers to the problem of finding relevant labels from an extremely large label space, prevalent in applications such as recommender systems, web-scale document tagging, large language models, and question answering systems. This thesis investigates three fundamental challenges in XMC, namely storage and computational efficiency, robustness to data irregularities, and robustness against adversarial attacks. Concerning efficiency, this thesis highlights the high space and computational complexities of using meta classifiers for negative sampling in deep XMC models. It proposes a method utilizing Maximum Inner Product Search (MIPS), which achieves comparable accuracy to methods based on meta classifiers while reducing space and computational demands by eliminating the need to train and store meta classifiers. To address data irregularities, the thesis explores the use of unbiased estimates for tackling the missing labels problem and rebalanced loss functions to manage data imbalance. It discusses the practical optimization challenges related to unbiased estimates, namely non-convexity and non-lower-boundedness of unbiased loss functions. To overcome these issues, it proposes an alternative approach by employing convex surrogates for the unbiased 0-1 loss. Regarding robustness to adversarial attacks, the thesis first defines adversarial attacks within the multilabel context of XMC models for text classification. Then, it evaluates the robustness of XMC models, focusing on the pervasive data imbalance in XMC datasets, which highlights the high vulnerability of infrequent classes to adversarial attacks. Finally, the thesis explores adapting rebalanced convex surrogates, demonstrating their impact on significantly improving the robustness of infrequent classes against these attacks. Together, the findings advance the scalability, accuracy, and security of multilabel classification models in settings with extremely large label spaces. - Efficient Transfer Learning with Sequential and Multi-Modal Approaches for Electronic Health Records
School of Science | Doctoral dissertation (article-based)(2024) Kumar, YogeshThe digital transformation in healthcare has dramatically increased data availability, yet the potential for data-driven insights is frequently constrained by the quality of data. Securing high-quality data is particularly challenging in fields like healthcare, where expert involvement is crucial for gathering, annotating, and ensuring data quality. This thesis applies deep learning to Electronic Health Records (EHR) to enhance predictive accuracy and operational efficiency. Deep learning models are particularly adept at capturing complex and non-linear relationships present in EHR data, but they require extensive training datasets to be effective. This study explores and develops ways to employ transfer learning to effectively mitigate these data constraints. The thesis tackles four key research questions aimed at improving healthcare outcomes using EHR data. It begins by enhancing prediction accuracy for healthcare utilization through an RNN model with multi-headed attention, which significantly outperforms traditional count-based models and shows robust time generalization. The study then introduces SANSformer, a custom-built, attention-free sequential model optimized for EHR specifics. This model excels in predicting healthcare demand, particularly in diverse patient subgroups, while managing limited data scenarios via transfer learning. Thirdly, the thesis explores the enhancement of neural network similarity metrics in assessing functional similarities, particularly in the context of transfer learning and model performance. It introduces a covariate adjustment to correct traditional metrics, which are often misled by input data structures, ensuring they reflect true functional similarities. Lastly, it explores the integration of expert annotations into the medical CLIP model, eCLIP, which utilizes radiologist eye-gaze heatmaps to substantially improve the quality of embeddings and sample efficiency in multi-modal medical imaging. The findings from this thesis highlight the significant potential of deep learning to enhance prediction of healthcare outcomes by addressing the unique challenges of EHR data. The research adapts sophisticated deep learning models to meet the complex demands of EHR data and introduces novel techniques like covariate adjustment for similarity metrics and integrating expert annotations to set a foundation for further advancements in healthcare analytics. - Interaction Detection with Probabilistic Deep Learning for Genetics
School of Science | Doctoral dissertation (article-based)(2023) Cui, TianyuDeep learning is an important machine learning tool in genetics because of its ability to model nonlinear relations between genotypes and phenotypes, such as genetic interactions, without any assumptions about the forms of relations. However, current deep learning approaches are restricted in genetics applications by (i) the lack of well-calibrated uncertainty estimation about the model and (ii) limited accessible individual-level data for model training. This thesis aims to design principled approaches to tackle the shortcomings of deep learning with two relevant statistical genetics applications: gene-gene interaction detection and genotype-phenotype prediction. First, we focus on interaction detection with deep learning. We provide calibrated uncertainty estimations to interaction detection in deep learning with Bayesian principles, which are used to control statistical errors, e.g., false positive rate and false negative rate, of detected interactions. In genetic interaction detection applications, we design a novel neural network architecture to increase the power of detecting complex gene-gene interactions by learning gene representations that aggregate information from all SNPs (single-nucleotide polymorphisms) of the genes being analyzed and considering complex interaction forms between them beyond only the currently considered multiplicative interactions. Moreover, we propose a new permutation procedure that gives calibrated null distributions of genetic interactions from the neural network. Second, we study deep learning models in the low-data regime. We improve deep learning prediction by incorporating domain knowledge with informative priors. Specifically, we design informative Gaussian scale mixture priors that explicitly encode prior beliefs about feature sparsity and data signal-to-noise ratio into deep learning models, which improve their accuracy on regression tasks, such as genotype-phenotype prediction, especially when only a small training set is available. Moreover, we study how to understand better the working mechanism of low-data deep learning models that share knowledge from multiple similar domains, such as transfer learning, with representation similarity. We find that current representation similarities of deep learning models on multiple domains give counter-intuitive conclusions about their functional similarities due to the confounding effect of the input data structure. Therefore, we introduce a deconfounding step to adjust for the confounder, which improves the consistency of representation similarities w.r.t. functional similarities of models. - Natural Language Processing for Healthcare: Text Representation, Multitask Learning, and Applications
School of Science | Doctoral dissertation (article-based)(2023) Ji, ShaoxiongThe emergence of deep learning algorithms in natural language processing has boosted the development of intelligent medical information systems. Firstly, this dissertation explores effective text encoding for clinical text. We propose a dilated convolutional attention network with dilated convolutions to capture complex medical patterns in long clinical notes by exponentially increasing the receptive field with the dilation size. Furthermore, we propose to utilize embedding injection and gated information propagation in the medical note encoding module for better representation learning of the lengthy clinical text. To capture the interaction between notes and codes, we explicitly model the underlying dependency between notes and codes and utilize textual descriptions of medical codes as external knowledge. We also adopt the contextualized graph embeddings to learn contextual information and causal relationships between text mentions such as drugs taken and adverse reactions. We also conduct an empirical analysis on the effectiveness of transfer learning with language model pretraining to clinical text encoding and medical code prediction. We develop a hierarchical encoding model to equip the pretrained language models with the capacity to encode long clinical notes. We further study the effect of pretraining in different domains and with different strategies. The comprehensive quantitative analysis shows that hierarchical encoding can capture interactions between distant words to some extent. Then, this dissertation investigates the multitask learning paradigm and its applications to healthcare. Multitask learning, motivated by human learning from previous tasks to help with a new task, makes full use of the information contained in each task and shares information between related tasks through common parameters. We adopt multitask learning for medical code prediction and demonstrate the benefits of leveraging multiple coding schemes. We design a recalibrated aggregation module to generate clinical document features with better quality and less noise in the shared modules of multitask networks. Finally, we consider the task context to improve multitask learning for healthcare. We propose to use a domain-adaptive pretrained model and hypernetwork-guided multitask heads to learn shared representation modules and task-specific predictors. Specifically, the domain-adaptive pretrained model is directly pretrained in the target domain of clinical applications. Task embeddings as task context are used to generate task-specific parameters with hypernetworks. Experiments show that the proposed hypernetwork-guided multitask learning method can achieve better predictive performance and semantic task information can improve the generalizability of the task-conditioned multitask model.