Browsing by Author "Szyller, Sebastian"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
- Adversary Detection in Online Machine Learning Systems
Perustieteiden korkeakoulu | Master's thesis(2020-03-16) Szyller, SebastianMachine learning applications have become increasingly popular. At the same time, model training has become an expensive task in terms of computational power, amount of data, and human expertise. As a result, models now constitute intellectual property and business advantage to model owners and thus, their confidentiality must be preserved. Recently, it was shown that models can be stolen via model extraction attacks that do not require physical white-box access to the model but merely a black-box prediction API. Stolen model can be used to avoid paying for the service or even to undercut the offering of the legitimate model owner. Hence, it deprives the victim of the accumulated business advantage. In this thesis, we introduce two novel defense methods designed to detect distinct classes of model extraction attacks. - Detecting organized eCommerce fraud using scalable categorical clustering
A4 Artikkeli konferenssijulkaisussa(2019) Marchal, Samuel; Szyller, SebastianOnline retail, eCommerce, frequently falls victim to fraud conducted by malicious customers (fraudsters) who obtain goods or services through deception. Fraud coordinated by groups of professional fraudsters that place several fraudulent orders to maximize their gain is referred to as organized fraud. Existing approaches to fraud detection typically analyze orders in isolation and they are not effective at identifying groups of fraudulent orders linked to organized fraud. These also wrongly identify many legitimate orders as fraud, which hinders their usage for automated fraud cancellation. We introduce a novel solution to detect organized fraud by analyzing orders in bulk. Our approach is based on clustering and aims to group together fraudulent orders placed by the same group of fraudsters. It selectively uses two existing techniques, agglomerative clustering and sampling to recursively group orders into small clusters in a reasonable amount of time. We assess our clustering technique on real-world orders placed on the Zalando website, the largest online apparel retailer in Europe1. Our clustering processes 100,000s of orders in a few hours and groups 35-45% of fraudulent orders together. We propose a simple technique built on top of our clustering that detects 26.2% of fraud while raising false alarms for only 0.1% of legitimate orders. - Guarantees of Differential Privacy in Overparameterised Models
Perustieteiden korkeakoulu | Master's thesis(2021-08-23) Micozzi, EleonoraDeep Learning (DL) has become increasingly popular in recent years. While DL models can achieve high levels of accuracy, due to their dimensionality they also tend to leak information about the data points in their training dataset. This leakage is mainly caused by overfitting, which is the tendency of Machine Learning (ML) models to behave differently on their training set with respect to their test set. Overfitted models are prone to privacy leaks because they do not generalize well, and they memorize information on their training data. Differential Privacy (DP) has been adopted as the de facto standard for privacy of data in ML. DP is normally applied to ML models through a process called Differentially Private Stochastic Gradient Descent (DP-SGD): it involves adding noise to the gradient update step, limiting the effect any data sample has on the model. Since DP protects data points by limiting their effect on the model, it is also considered a strong defence against Membership Inference Attacks (MIAs). MIAs are a type of attack against the privacy of ML models, that aim to infer whether a data point was part of the training set of a target model. This information is sensitive, and therefore needs to be protected. This thesis work explores the relationship between differential privacy and membership inference attacks, and the effect overfitting has on the privacy leakage. We test the effectiveness of DP as a defence against MIAs by analyzing and reproducing 3 state-of-the-art MIAs, and testing them on models trained with different privacy budgets, and without DP. Our results show that differential privacy is an effective defence against membership inference attacks, reducing their effectiveness significantly with respect to non-private models. - Ownership and Confidentiality in Machine Learning
School of Science | Doctoral dissertation (article-based)(2023) Szyller, SebastianStatistical and machine learning (ML) models have been the primary tools for data-driven analysis for decades. Recent theoretical progress in deep neural networks (DNNs) coupled with computational advances put DNNs at the forefront of ML in the domains of vision, audio and language understanding. Alas, this has made DNNs targets for a wide array of attacks. Their complexity revealed a wider range of vulnerabilities compared to the much simpler models of the past. As of now, attacks have been proposed against every single step of the ML pipeline: gathering and preparation of data, model training, model serving and inference. In order to effectively build and deploy ML models, model builders invest vast resources into gathering, sanitising and labelling the data, designing and training the models, as well as serving them effectively to their customers. ML models embody valuable intellectual property (IP), and thus business advantage that needs to be protected. Model extraction attacks aim to mimic the functionality of ML models, or even compromise their confidentiality. An adversary who extracts the model can leverage it for other attacks, continuously use the model without paying, or even undercut the original owner by providing a competing service at a lower cost. All research questions investigated in this dissertation share the common theme of the theft of ML models or their functionality. The dissertation is divided into four parts. In the first part, I explore the feasibility of model extraction attacks. In the publications discussed in this part, my coauthors and I design novel black- box extraction attacks against classification and image-translation deep neural networks. Our attacks result in surrogate models that rival the victim models at their tasks. In the second part, we investigate ways of addressing the threat of model extraction; I propose two detection mechanisms able to identify ongoing extraction attacks in certain settings with the following caveat: detection and prevention cannot stop a well-equipped adversary from extracting the model. Hence, in the third part, I focus on reliable ownership verification. By identifying extracted models and tracing them back to the victim, ownership verification can deter model extraction. In the publications discussed in this part, I demonstrate it by introducing the first watermarking scheme designed specifically against extraction attacks. Crucially, I critically evaluate the reliability of my approach w.r.t. the capabilities of an adaptive adversary. Further, I empirically evaluate a promising model fingerprinting scheme, and show that well-equipped adaptive adversaries remain a threat to model confidentiality. In the fourth part, I identify the problem of conflicting interactions among protection mechanisms. ML models are vulnerable to various attacks, and thus, may need to be deployed with multiple protection mechanisms at once. I show that combining ownership verification with protection mechanisms against other security/privacy concerns can result in conflicts. The dissertation concludes, with my observations about model confidentiality, the feasibility of ownership verification, and potential directions for future work.