The Aalto system based on fine-tuned AudioSet features for DCASE2018 task2 - general purpose audio tagging

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Xu, Zhicun
dc.contributor.author Smit, Peter
dc.contributor.author Kurimo, Mikko
dc.date.accessioned 2018-12-21T10:30:48Z
dc.date.available 2018-12-21T10:30:48Z
dc.date.issued 2018-11
dc.identifier.citation Xu , Z , Smit , P & Kurimo , M 2018 , The Aalto system based on fine-tuned AudioSet features for DCASE2018 task2 - general purpose audio tagging . in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018) . , 29 , Tampere University of Technology , pp. 24-28 , Detection and Classification of Acoustic Scenes and Events , Surrey , United Kingdom , 19/11/2018 . en
dc.identifier.isbn 978-952-15-4262-6
dc.identifier.other PURE UUID: 5ce018ed-dfaa-4d8d-9966-9fc15f635869
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/the-aalto-system-based-on-finetuned-audioset-features-for-dcase2018-task2--general-purpose-audio-tagging(5ce018ed-dfaa-4d8d-9966-9fc15f635869).html
dc.identifier.other PURE LINK: http://dcase.community/documents/workshop2018/proceedings/DCASE2018Workshop_Xu_29.pdf
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/30233157/DCASE2018Workshop_Xu_29.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/35661
dc.description.abstract In this paper, we presented a neural network system for DCASE 2018 task 2, general purpose audio tagging. We fine-tuned the Google AudioSet feature generation model with different settings for the given 41 classes on top of a fully connected layer with 100 units. Then we used the fine-tuned models to generate 128 dimensional features for each 0.960s audio. We tried different neural network structures including LSTM and multi-level attention models. In our experiments, the multi-level attention model has shown its superiority over others. Truncating the silence parts, repeating and splitting the audio into the fixed length, pitch shifting augmentation, and mixup techniques are all used in our experiments. The proposed system achieved a result with MAP@3 score at 0.936, which outperforms the baseline result of 0.704 and achieves top 8% in the public leaderboard. en
dc.format.extent 5
dc.format.extent 24-28
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation.ispartof Detection and Classification of Acoustic Scenes and Events en
dc.relation.ispartofseries Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018) en
dc.rights openAccess en
dc.subject.other 213 Electronic, automation and communications engineering, electronics en
dc.title The Aalto system based on fine-tuned AudioSet features for DCASE2018 task2 - general purpose audio tagging en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department Centre of Excellence in Computational Inference, COIN
dc.contributor.department Department of Signal Processing and Acoustics
dc.contributor.department Department of Signal Processing and Acoustics en
dc.subject.keyword 213 Electronic, automation and communications engineering, electronics
dc.identifier.urn URN:NBN:fi:aalto-201812216670
dc.type.version publishedVersion


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

My Account