Two-Stage Overfitting of Neural Network-Based Video Coding In-Loop Filter

dc.contributorAalto Universityen
dc.contributor.advisorCricri, Francesco
dc.contributor.authorJánosi, József-Hunor
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.supervisorKannala, Juho
dc.description.abstractModern video coding standards like the Versatile Video Coding (VVC) produce compression artefacts, due to their block-based, lossy compression techniques. These artefacts are mitigated to an extent by the in-loop filters inside the coding process. Neural Network (NN) based in-loop filters are being explored for the denoising tasks, and in recent studies, these NN-based loop filters are overfitted on test content to achieve a content-adaptive nature, and further enhance the visual quality of the video frames, while balancing the trade-off between quality and bitrate. This loop filter is a relatively low-complexity Convolutional Neural Network (CNN) that is pretrained on a general video dataset and then fine-tuned on the video that needs to be encoded. Only a set of parameters inside the CNN architecture, named multipliers, are fine-tuned, thus the bitrate overhead, that is signalled to the decoder, is minimized. The created weight update is compressed using the Neural Network Compression and Representation (NNR) standard. In this project, an exploration of high-performing hyperparameters was conducted, and the two-stage training process was employed to, potentially, further increase the coding efficiency of the in-loop filter. A first-stage model was overfitted on the test video sequence, it explored on which patches of the dataset it could improve the quality of the unfiltered video data, and then the second-stage model was overfitted only on these patches that provided a gain. The model with best-found hyperparameters saved on average 1.01% (Y), 4.28% (Cb), and 3.61% (Cr) Bjontegaard Delta rate (BD-rate) compared to the Versatile Video Coding (VVC) Test Model (VTM) 11.0 NN-based Video Coding (NNVC) 5.0, Random Access (RA) Common Test Conditions (CTC). The second-stage model, although exceeded the VTM, it underperformed with about 0.20% (Y), 0.23% (Cb), and 0.18% (Cr) BD-rate with regards to the first-stage model, due to the high bitrate overhead created by the second-stage model.en
dc.programmeMaster's Programme in ICT Innovationfi
dc.programme.majorData Sciencefi
dc.subject.keywordneural video codingen
dc.subject.keywordin-loop filteren
dc.subject.keywordtwo-stage trainingen
dc.titleTwo-Stage Overfitting of Neural Network-Based Video Coding In-Loop Filteren
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
1.47 MB
Adobe Portable Document Format