Dataset Watermarking

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2021-08-23
Department
Major/Subject
Security and Cloud Computing
Mcode
SCI3113
Degree programme
Master’s Programme in Security and Cloud Computing (SECCLO)
Language
en
Pages
58
Series
Abstract
Datasets are gaining more importance and economic value, since they are usedfor verification of publications, statistical analysis, and training machine learningmodels. Therefore dataset owners should be careful when they publish or preservetheir datasets. For example, adversaries who has access to the dataset can trainand monetize a machine learning model in an unauthorized way. In this case, averification mechanism is useful to track the original owner of the dataset. Datasetwatermarking techniques are introduced to help dataset owners determine if a modelis trained on their dataset without authorization. These techniques are relatively new,hence, there has been no work that investigates the robustness of these techniques.This thesis evaluates the robustness of two watermarking techniques for imagedatasets. We show that the watermark can be detected and removed from thedataset by applying a decontamination process based on cosine distances among theembeddings of the samples in each class. A model that is trained on the sanitizeddataset can evade the ownership verification with approximately 1-5 percentagepoints reduction in the test accuracy compared to a model that is trained on theclean dataset.In addition to the evaluation of dataset watermarking techniques, this thesis alsopresents a survey on recent watermarking methods for machine learning models thatare designed for various tasks and consider different adversary models. Additionally,the survey includes other model ownership verification techniques that are not relatedto watermarking and only use specific characteristics of the models or training setsto demonstrate the model ownership.
Description
Supervisor
Asokan, N.
Thesis advisor
Tekgul, Buse
Keywords
dataset watermarking, model watermarking, ownership verification, machine learning security
Other note
Citation