The Aalto system based on fine-tuned AudioSet features for DCASE2018 task2 - general purpose audio tagging

Loading...
Thumbnail Image

Access rights

openAccess
publishedVersion

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Major/Subject

Mcode

Degree programme

Language

en

Pages

5

Series

Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), pp. 24-28

Abstract

In this paper, we presented a neural network system for DCASE 2018 task 2, general purpose audio tagging. We fine-tuned the Google AudioSet feature generation model with different settings for the given 41 classes on top of a fully connected layer with 100 units. Then we used the fine-tuned models to generate 128 dimensional features for each 0.960s audio. We tried different neural network structures including LSTM and multi-level attention models. In our experiments, the multi-level attention model has shown its superiority over others. Truncating the silence parts, repeating and splitting the audio into the fixed length, pitch shifting augmentation, and mixup techniques are all used in our experiments. The proposed system achieved a result with MAP@3 score at 0.936, which outperforms the baseline result of 0.704 and achieves top 8% in the public leaderboard.

Description

| openaire: EC/H2020/780069/EU//MeMAD

Keywords

Other note

Citation

Xu, Z, Smit, P & Kurimo, M 2018, The Aalto system based on fine-tuned AudioSet features for DCASE2018 task2 - general purpose audio tagging. in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018)., 29, Tampere University of Technology, pp. 24-28, Detection and Classification of Acoustic Scenes and Events, Surrey, United Kingdom, 19/11/2018. < http://dcase.community/documents/workshop2018/proceedings/DCASE2018Workshop_Xu_29.pdf >