Foundation model for detecting AI-generated visual content

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Master's thesis

Department

Mcode

Language

en

Pages

81

Series

Abstract

The vast quantity of synthetic content produced by modern, accessible generative AI models and the lack of robust detection tools presents a significant danger to information integrity across the internet. This study intoduced a comprehensive foundation model approach for deepfake detection, demonstrating a strong performance across a diverse range of generative models. Following the philosophy of foundation model development, we document the complete architectural evolution: from traditional CNN baselines through multi-modal ensembles to our final PE-Giant adaptation, offering a systematic roadmap for the subject. Trained on a self-generated 5.2M-sample synthetic dataset with text-to-image, image-to-image, text-to-video and image-to-video generation methods, our model achieves state-of-the-art results in distinguishing authentic, fully-synthetic and partially-synthetic content. Most importantly, the model demonstrates robust cross-domain generalization maintaining high accuracy not only on unseen data by the same architectural family models but also on content generated via entirely separate synthesis paradigms. We also organised a big 48 hours AI hackathon in Finland to bring attention to the problem.

Description

Supervisor

Marttinen, Pekka

Thesis advisor

Ilin, Alexander

Other note

Citation