Applying dnn adaptation to reduce the session dependency of ultrasound tongue imaging-based silent speech interfaces
Loading...
Access rights
openAccess
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2020-01-01
Major/Subject
Mcode
Degree programme
Language
en
Pages
16
109-124
109-124
Series
ACTA POLYTECHNICA HUNGARICA, Volume 17, issue 7
Abstract
Silent Speech Interfaces (SSI) perform articulatory-to-acoustic mapping to convert articulatory movement into synthesized speech. Its main goal is to aid the speech handicapped, or to be used as a part of a communication system operating in silence-required environments or in those with high background noise. Although many previous studies addressed the speaker-dependency of SSI models, session-dependency is also an important issue due to the possible misalignment of the recording equipment. In particular, there are currently no solutions available, in the case of tongue ultrasound recordings. In this study, we investigate the degree of session-dependency of standard feed-forward DNN-based models for ultrasound-based SSI systems. Besides examining the amount of training data required for speech synthesis parameter estimation, we also show that DNN adaptation can be useful for handling session dependency. Our results indicate that by using adaptation, less training data and training time are needed to achieve the same speech quality over training a new DNN from scratch. Our experiments also suggest that the sub-optimal cross-session behavior is caused by the misalignment of the recording equipment, as adapting just the lower, feature extractor layers of the neural network proved to be sufficient, in achieving a comparative level of performance.Description
Keywords
Articulatory-to-acoustic mapping, Deep Neural Networks, DNN adaptation, Session dependency, Silent speech interfaces
Other note
Citation
Gosztolya, G, Grósz, T, Tóth, L, Markó, A & Csapó, T G 2020, ' Applying dnn adaptation to reduce the session dependency of ultrasound tongue imaging-based silent speech interfaces ', Acta Polytechnica Hungarica, vol. 17, no. 7, pp. 109-124 . https://doi.org/10.12700/APH.17.7.2020.7.6