Dataset Watermarking

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2021-08-23

Department

Major/Subject

Security and Cloud Computing

Mcode

SCI3113

Degree programme

Master’s Programme in Security and Cloud Computing (SECCLO)

Language

en

Pages

58

Series

Abstract

Datasets are gaining more importance and economic value, since they are usedfor verification of publications, statistical analysis, and training machine learningmodels. Therefore dataset owners should be careful when they publish or preservetheir datasets. For example, adversaries who has access to the dataset can trainand monetize a machine learning model in an unauthorized way. In this case, averification mechanism is useful to track the original owner of the dataset. Datasetwatermarking techniques are introduced to help dataset owners determine if a modelis trained on their dataset without authorization. These techniques are relatively new,hence, there has been no work that investigates the robustness of these techniques.This thesis evaluates the robustness of two watermarking techniques for imagedatasets. We show that the watermark can be detected and removed from thedataset by applying a decontamination process based on cosine distances among theembeddings of the samples in each class. A model that is trained on the sanitizeddataset can evade the ownership verification with approximately 1-5 percentagepoints reduction in the test accuracy compared to a model that is trained on theclean dataset.In addition to the evaluation of dataset watermarking techniques, this thesis alsopresents a survey on recent watermarking methods for machine learning models thatare designed for various tasks and consider different adversary models. Additionally,the survey includes other model ownership verification techniques that are not relatedto watermarking and only use specific characteristics of the models or training setsto demonstrate the model ownership.

Description

Supervisor

Asokan, N.

Thesis advisor

Tekgul, Buse

Keywords

dataset watermarking, model watermarking, ownership verification, machine learning security

Other note

Citation