Browsing by Author "Mohammadnia Qaraei, Mohammadreza"
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
- Adversarial examples for extreme multilabel text classification
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2022-12) Mohammadnia Qaraei, Mohammadreza; Babbar, RohitExtreme Multilabel Text Classification (XMTC) is a text classification problem in which, (i) the output space is extremely large, (ii) each data point may have multiple positive labels, and (iii) the data follows a strongly imbalanced distribution. With applications in recommendation systems and automatic tagging of web-scale documents, the research on XMTC has been focused on improving prediction accuracy and dealing with imbalanced data. However, the robustness of deep learning based XMTC models against adversarial examples has been largely underexplored. In this paper, we investigate the behaviour of XMTC models under adversarial attacks. To this end, first, we define adversarial attacks in multilabel text classification problems. We categorize attacking multilabel text classifiers as (a) positive-to-negative, where the target positive label should fall out of top-k predicted labels, and (b) negative-to-positive, where the target negative label should be among the top-k predicted labels. Then, by experiments on APLC-XLNet and AttentionXML, we show that XMTC models are highly vulnerable to positive-to-negative attacks but more robust to negative-to-positive ones. Furthermore, our experiments show that the success rate of positive-to-negative adversarial attacks has an imbalanced distribution. More precisely, tail classes are highly vulnerable to adversarial attacks for which an attacker can generate adversarial samples with high similarity to the actual data-points. To overcome this problem, we explore the effect of rebalanced loss functions in XMTC where not only do they increase accuracy on tail classes, but they also improve the robustness of these classes against adversarial attacks. The code for our experiments is available at https://github.com/xmc-aalto/adv-xmtc. - Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels
A4 Artikkeli konferenssijulkaisussa(2021-04-19) Mohammadnia Qaraei, Mohammadreza; Schultheis, Erik; Gupta, Priyanshu; Babbar, RohitExtreme Classification (XC) refers to supervised learning where each training/test instance is labeled with small subset of relevant labels that are chosen from a large set of possible target labels. The framework of XC has been widely employed in web applications such as automatic labeling of web-encyclopedia, prediction of related searches, and recommendation systems. While most state-of-the-art models in XC achieve high overall accuracy by performing well on the frequently occurring labels, they perform poorly on a large number of infrequent (tail) labels. This arises from two statistical challenges, (i) missing labels, as it is virtually impossible to manually assign every relevant label to an instance, and (ii) highly imbalanced data distribution where a large fraction of labels are tail labels. In this work, we consider common loss functions that decompose over labels, and calculate unbiased estimates that compensate missing labels according to Natarajan et al. [26]. This turns out to be disadvantageous from an optimization perspective, as important properties such as convexity and lower-boundedness are lost. To circumvent this problem, we use the fact that typical loss functions in XC are convex surrogates of the 0-1 loss, and thus propose to switch to convex surrogates of its unbiased version. These surrogates are further adapted to the label imbalance by combining with label-frequency-based rebalancing. We show that the proposed loss functions can be easily incorporated into various different frameworks for extreme classification. This includes (i) linear classifiers, such as DiSMEC, on sparse input data representation, (ii) attention-based deep architecture, AttentionXML, learnt on dense Glove embeddings, and (iii) XLNet-based transformer model for extreme classification, APLC-XLNet. Our results demonstrate consistent improvements over the respective vanilla baseline models, on the propensity-scored metrics for precision and nDCG. - Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model
A4 Artikkeli konferenssijulkaisussa(2022) Rastas, Iiro; Ciarán Ryan, Yann; Tiihonen, Iiro; Mohammadnia Qaraei, Mohammadreza; Repo, Liina; Babbar, Rohit; Mäkelä, Eetu; Tolonen, Mikko; Ginter, FilipIn this paper, we describe a BERT model trained on the Eighteenth Century Collections Online (ECCO) dataset of digitized documents. The ECCO dataset poses unique modelling challenges due to the presence of Optical Character Recognition (OCR) artifacts. We establish the performance of the BERT model on a publication year prediction task against linear baseline models and human judgement, finding the BERT model to be superior to both and able to date the works, on average, with less than 7 years absolute error. We also explore how language change over time affects the model by analyzing the features the model uses for publication year predictions as given by the Integrated Gradients model explanation method. - Meta-classifier free negative sampling for extreme multilabel classification
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2024-02) Mohammadnia Qaraei, Mohammadreza; Babbar, RohitNegative sampling is a common approach for making the training of deep models in classification problems with very large output spaces, known as extreme multilabel classification (XMC) problems, tractable. Negative sampling methods aim to find per instance negative labels with higher scores, known as hard negatives, and limit the computations of the negative part of the loss to these labels. Two well-known methods for negative sampling in XMC models are meta-classifier-based and Maximum Inner product Search (MIPS)-based adaptive methods. Owing to their good prediction performance, methods which employ a meta classifier are more common in contemporary XMC research. On the flip side, they need to train and store the meta classifier (apart from the extreme classifier), which can involve millions of additional parameters. In this paper, we focus on the MIPS-based methods for negative sampling. We highlight two issues which may prevent deep models trained by these methods to undergo stable training. First, we argue that using hard negatives excessively from the beginning of training leads to unstable gradient. Second, we show that when all the negative labels in a MIPS-based method are restricted to only those determined by MIPS, training is sensitive to the length of intervals for pre-processing the weights in the MIPS method. To mitigate the aforementioned issues, we propose to limit the labels selected by MIPS to only a few and sample the rest of the needed labels from a uniform distribution. We show that our proposed MIPS-based negative sampling can reach the performance of LightXML, a transformer-based model trained by a meta classifier, while there is no need to train and store any additional classifier. The code for our experiments is available at https://github.com/xmc-aalto/mips-negative-sampling. - Randomized non-linear PCA networks
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2021-02-04) Mohammadnia Qaraei, Mohammadreza; Abbaasi, Saeid; Ghiasi-Shirazi, KamaledinPCANet is an unsupervised Convolutional Neural Network (CNN), which uses Principal Component Analysis (PCA) to learn convolutional filters. One drawback of PCANet is that linear PCA cannot capture nonlinear structures within data. To address this problem, a straightforward approach is utilizing kernel methods by equipping the PCA method in PCANet with a kernel function. However, this practice leads to a network having cubic complexity with respect to the number of training image patches. In this paper, we propose a network called Randomized Nonlinear PCANet (RNPCANet), which uses an explicit kernel PCA to learn the convolutional filters. Although RNPCANet utilizes kernel methods for nonlinear processing of data, using kernel approximation techniques to define an explicit feature space in each stage, we theoretically show that the complexity of this model is not much higher than that of PCANet. We also show that our method links PCANets to Convolutional Kernel Networks (CKNs) as the proposed model maps the patches to a kernel feature space similar to CKNs. We evaluate our model on image recognition tasks including Coil-20, Coil-100, ETH-80, Caltech-101, MNIST, and C-Cube datasets. The experimental results show that the proposed method has superiority over PCANet and CKNs in terms of recognition accuracy. - Why state-of-the-art deep learning barely works as good as a linear classifier in extreme multi-label text classification
A4 Artikkeli konferenssijulkaisussa(2020) Mohammadnia Qaraei, Mohammadreza; Khandagale, Sujay; Babbar, RohitExtreme Multi-label Text Classification (XMTC) refers to supervised learning of a classifier which can predict a small subset of relevant labels for a document from an extremely large set. Even though deep learning algorithms have surpassed linear and kernel methods for most natural language processing tasks over the last decade; recent works show that state-of-the-art deep learning methods can only barely manage to work as well as a linear classifier for the XMTC task. The goal of this work is twofold : (i) to investigate the reasons for the comparable performance of these two strands of methods for XMTC, and (ii) to document this observation explicitly, as the efficacy of linear classifiers in this regime, has been ignored in many relevant recent works.