Cognitive and probabilistic basis of prominence perception in speech
School of Electrical Engineering | Doctoral thesis (article-based) | Defence date: 2017-05-26
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Signaalinkäsittelyn ja akustiikan laitos
Department of Signal Processing and Acoustics
Department of Signal Processing and Acoustics
91 + app. 142
Aalto University publication series DOCTORAL DISSERTATIONS, 88/2017
AbstractThe research in this thesis examines the topic of the cognitive and probabilistic nature of prominence perception in speech. In recent years, there has been an accumulating number of studies from linguistics, phonetics, and neuroscience providing evidence that (i) prominence is related to attention- and expectation-based factors, (ii) frequency and predictability effects hold an important role in language processing, accounting for several linguistic phenomena, and (iii) the human brain represents information in a probabilistic way, with humans behaving as optimal probabilistic observers. On the basis of this evidence, the relationship between prominence, attention, and predictability is explored. A hypothesis is proposed suggesting that prominence perception in speech is connected with the unpredictability of prosodic features that draw the listeners' attention to the surprising aspects of the input. This thesis consists of a series of computational and behavioral studies that investigate different aspects of the prominence–attention–predictability tripartite. The core idea throughout this work is to investigate the probabilistic relations that take place at the acoustic prosodic domain through statistical modeling of the acoustic correlates of prominence, examining their relationship with the concurrent prominent/non-prominent units. As the probabilistic view of prominence also implies that listeners utilize some type of statistical learning mechanism operating at the suprasegmental acoustic prosodic level, a number of behavioral experiments are also conducted. The aim of these experiments is to understand whether human listeners are sensitive to the statistical regularities of suprasegmental speech acoustics and, if so, to what extent. A basic application of statistical models for the automatic detection of prominence in speech is also reported. As a result of these studies, the thesis shows that predictability at the acoustic prosodic level is strongly correlated with human listeners' perception of prominence in speech. This statistical connection, however, is not fixed but depends on the listeners' experience with the language and thereby with subjective expectations of prosodic outcomes. This is illuminated by results that show that the human perceptual system appears to quickly adapt to the suprasegmental probabilistic structure of the incoming speech, causing the prosodic patterns that are less frequent in the recent discourse-specific acoustics to be more prominent. Thus, the experiments indicate a type of statistical learning mechanism operating at the suprasegmental acoustic level. Finally, a practical application of the predictability framework to the unsupervised detection of prominence in speech is described. Experiments in several languages show that the method provides high agreement with human judgments of prominence despite not having access to prominence labeling during training of the detector.
Supervising professorLaine, Unto K., Prof. Emer., Aalto University, Department of Signal Processing and Acoustics, Finland
Alku, Paavo, Acad. Prof., Aalto University, Department of Signal Processing and Acoustics, Finland
Thesis advisorRäsänen, Okko, Dr., Aalto University, Department of Signal Processing and Acoustics, Finland
prosody, prominence, attention, speech perception, statistical learning, stimulus predictability, speech analysis, cognitive modeling
[Publication 1]: Sofoklis Kakouros, Okko Räsänen, and Unto K. Laine. Attention Based Temporal Filtering of Sensory Signals for Data Redundancy Reduction. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-2013), pp. 3188–3192, Vancouver, Canada, May 2013.
DOI: 10.1109/ICASSP.2013.6638246 View at publisher
- [Publication 2]: Sofoklis Kakouros and Okko Räsänen. Perception of Sentence Stress in English Infant Directed Speech. In 15th Annual Conference of the International Speech Communication Association (Interspeech-2014), pp. 1821–1825, Singapore, September 2014.
[Publication 3]: Sofoklis Kakouros and Okko Räsänen. Perception of Sentence Stress in Speech Correlates with the Temporal Unpredictability of Prosodic Features. Cognitive Science, 40(7), 1739–1774, September 2016.
DOI: 10.1111/cogs.12306 View at publisher
- [Publication 4]: Sofoklis Kakouros and Okko Räsänen. Analyzing the Predictability of Lexeme-specific Prosodic Features as a Cue to Sentence Prominence. In 37th Annual Conference of the Cognitive Science Society (CogSci-2015), pp. 1039–1044, Pasadena, California, July 2015.
[Publication 5]: Sofoklis Kakouros, Joris Pelemans, Lyan Verwimp, Patrick Wambacq, and Okko Räsänen. Analyzing the Contribution of Top-down Lexical and Bottom-up Acoustic Cues in the Detection of Sentence Prominence. In 17th Annual Conference of the International Speech Communication Association (Interspeech-2016), pp. 1074–1078, San Francisco, California, September 2016.
DOI: 10.21437/Interspeech.2016-926 View at publisher
[Publication 6]: Sofoklis Kakouros and Okko Räsänen. 3PRO – An Unsupervised Method for the Automatic Detection of Sentence Prominence in Speech. Speech Communication, 82, 67–84, September 2016.
DOI: 10.1016/j.specom.2016.06.004 View at publisher
- [Publication 7]: Sofoklis Kakouros, Nelli Salminen, and Okko Räsänen. Making Predictable Unpredictable with Style – Behavioral and Electrophysiological Evidence for the Critical Role of Prosodic Expectations in the Perception of Prominence in Speech. Submitted to Neuropsychologia.