Browsing by Author "Kakouros, Sofoklis"
Now showing 1 - 8 of 8
- Results Per Page
- Sort Options
- Cognitive and probabilistic basis of prominence perception in speech
School of Electrical Engineering | Doctoral dissertation (article-based)(2017) Kakouros, SofoklisThe research in this thesis examines the topic of the cognitive and probabilistic nature of prominence perception in speech. In recent years, there has been an accumulating number of studies from linguistics, phonetics, and neuroscience providing evidence that (i) prominence is related to attention- and expectation-based factors, (ii) frequency and predictability effects hold an important role in language processing, accounting for several linguistic phenomena, and (iii) the human brain represents information in a probabilistic way, with humans behaving as optimal probabilistic observers. On the basis of this evidence, the relationship between prominence, attention, and predictability is explored. A hypothesis is proposed suggesting that prominence perception in speech is connected with the unpredictability of prosodic features that draw the listeners' attention to the surprising aspects of the input. This thesis consists of a series of computational and behavioral studies that investigate different aspects of the prominence–attention–predictability tripartite. The core idea throughout this work is to investigate the probabilistic relations that take place at the acoustic prosodic domain through statistical modeling of the acoustic correlates of prominence, examining their relationship with the concurrent prominent/non-prominent units. As the probabilistic view of prominence also implies that listeners utilize some type of statistical learning mechanism operating at the suprasegmental acoustic prosodic level, a number of behavioral experiments are also conducted. The aim of these experiments is to understand whether human listeners are sensitive to the statistical regularities of suprasegmental speech acoustics and, if so, to what extent. A basic application of statistical models for the automatic detection of prominence in speech is also reported. As a result of these studies, the thesis shows that predictability at the acoustic prosodic level is strongly correlated with human listeners' perception of prominence in speech. This statistical connection, however, is not fixed but depends on the listeners' experience with the language and thereby with subjective expectations of prosodic outcomes. This is illuminated by results that show that the human perceptual system appears to quickly adapt to the suprasegmental probabilistic structure of the incoming speech, causing the prosodic patterns that are less frequent in the recent discourse-specific acoustics to be more prominent. Thus, the experiments indicate a type of statistical learning mechanism operating at the suprasegmental acoustic level. Finally, a practical application of the predictability framework to the unsupervised detection of prominence in speech is described. Experiments in several languages show that the method provides high agreement with human judgments of prominence despite not having access to prominence labeling during training of the detector. - Comparison of spectral tilt measures for sentence prominence in speech — Effects of dimensionality and adverse noise conditions
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2018-10-01) Kakouros, Sofoklis; Räsänen, Okko; Alku, PaavoLinguistic prominence in speech is known to correlate with the acoustic measures of energy, F0, and duration. In contrast, the role of spectral tilt in the realization of prominence has remained more inconsistent between previous empirical investigations. This may be partially due to the lack of a standard method for quantifying spectral tilt or due to difficulties in estimating the acoustical source of spectral tilt, the glottal flow, from continuous speech. These issues have rendered interpretations and comparisons between studies difficult. In addition, (i) little is known about the robustness of tilt estimators for prominence detection in the case when speech is not clean but corrupted, as in real life, by environmental noise or telephone transmission (i.e. degradation caused by bandpass filtering and quantization noise). Moreover, (ii) little attention has been paid to multidimensional representations of source spectrum that can potentially incorporate more information about the phonation style than purely scalar measures. In this work, we study spectral tilt in signaling prominence in spoken Dutch and French under different levels of additive noise, and for telephone-band coded speech, and compare several one-dimensional tilt measures that have been previously encountered in the literature as well as multidimensional tilt measures. We also compare spectral tilt measures with other standard acoustic correlates for prominence, namely, energy, F0, and duration. Our results provide further empirical support for the finding that tilt is a systematic correlate of prominence in Dutch, that the role is smaller in French, and that energy, F0, and duration appear still to be the most robust features for discriminating prominent and non-prominent words. In addition, our results show that there are notable differences between different tilt measures at different levels of noise, and that multidimensional representations for tilt improve class separability from the scalar measures. - Cross-linguistic Influences on Sentence Accent Detection in Background Noise
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2020-03-01) Scharenborg, Odette; Kakouros, Sofoklis; Post, Brechtje; Meunier, FannyThis paper investigates whether sentence accent detection in a non-native language is dependent on (relative) similarity between prosodic cues to accent between the non-native and the native language, and whether cross-linguistic differences in the use of local and more widely distributed (i.e., non-local) cues to sentence accent detection lead to differential effects of the presence of background noise on sentence accent detection in a non-native language. We compared Dutch, Finnish, and French non-native listeners of English, whose cueing and use of prosodic prominence is gradually further removed from English, and compared their results on a phoneme monitoring task in different levels of noise and a quiet condition to those of native listeners. Overall phoneme detection performance was high for the native and the non-native listeners, but deteriorated to the same extent in the presence of background noise. Crucially, relative similarity between the prosodic cues to sentence accent of one’s native language compared to that of a non-native language does not determine the ability to perceive and use sentence accent for speech perception in that non-native language. Moreover, proficiency in the non-native language is not a straightforward predictor of sentence accent perception performance, although high proficiency in a non-native language can seemingly overcome certain differences at the prosodic level between the native and non-native language. Instead, performance is determined by the extent to which listeners rely on local cues (English and Dutch) versus cues that are more distributed (Finnish and French), as more distributed cues survive the presence of background noise better. - The Effect of Noise on Emotion Perception in an Unknown Language
A4 Artikkeli konferenssijulkaisussa(2018) Scharenborg, Odette; Kakouros, Sofoklis; Koemans, JiskaThis is the first study investigating the influence of “realistic” noise on verbal emotion perception in an unknown language. We do so by linking emotion perception to acoustic characteristics known to be correlated with emotion perception and investigating the effect of noise on the perception of these acoustic characteristics. Dutch students listened to Italian sentences in five emotions and were asked to indicate the emotion that was conveyed in the sentence. Sentences were presented in a clean and two babble noise conditions. Results showed that the participants were able to recognise emotions in the unknown language, and continued to perform above chance even in fairly bad listening conditions, indicating that verbal emotion may contain universal characteristics. Noise had a similar detrimental effect on the perception of the different emotions, though the impact on the use of the acoustic parameters for different emotion categories was different. - The Effects of a Digital Articulatory Game on the Ability to Perceive Speech-Sound Contrasts in Another Language
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2021-05-20) Ylinen, Sari; Smolander, Anna Riikka; Karhila, Reima; Kakouros, Sofoklis; Lipsanen, Jari; Huotilainen, Minna; Kurimo, MikkoDigital and mobile devices enable easy access to applications for the learning of foreign languages. However, experimental studies on the effectiveness of these applications are scarce. Moreover, it is not understood whether the effects of speech and language training generalize to features that are not trained. To this end, we conducted a four-week intervention that focused on articulatory training and learning of English words in 6–7-year-old Finnish-speaking children who used a digital language-learning game app Pop2talk. An essential part of the app is automatic speech recognition that enables assessing children’s utterances and giving instant feedback to the players. The generalization of the effects of such training in English were explored by using discrimination tasks before and after training (or the same period of time in a control group). The stimuli of the discrimination tasks represented phonetic contrasts from two non-trained languages, including Russian sibilant consonants and Mandarin tones. We found some improvement with the Russian sibilant contrast in the gamers but it was not statistically significant. No improvement was observed for the tone contrast for the gaming group. A control group with no training showed no improvement in either contrast. The pattern of results suggests that the game may have improved the perception of non-trained speech sounds in some but not all individuals, yet the effects of motivation and attention span on their performance could not be excluded with the current methods. Children’s perceptual skills were linked to their word learning in the control group but not in the gaming group where recurrent exposure enabled learning also for children with poorer perceptual skills. Together, the results demonstrate beneficial effects of learning via a digital application, yet raise a need for further research of individual differences in learning. - Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise Conditions
A4 Artikkeli konferenssijulkaisussa(2017-08) Kakouros, Sofoklis; Räsänen, Okko; Alku, PaavoSpectral tilt has been suggested to be a correlate of prominence in speech, although several studies have not replicated this empirically. This may be partially due to the lack of a standard method for tilt estimation from speech, rendering interpretations and comparisons between studies difficult. In addition, little is known about the performance of tilt estimators for prominence detection in the presence of noise. In this work, we investigate and compare several standard tilt measures on quantifying prominence in spoken Dutch and under different levels of additive noise. We also compare these measures with other acoustic correlates of prominence, namely, energy, F0, and duration. Our results provide further empirical support for the finding that tilt is a systematic correlate of prominence, at least in Dutch, even though energy, F0, and duration appear still to be more robust features for the task. In addition, our results show that there are notable differences between different tilt estimators in their ability to discriminate prominent words from non-prominent ones in different levels of noise. - Is infant-directed speech interesting because it is surprising? – Linking properties of IDS to statistical learning and attention at the prosodic level
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2018-09-01) Räsänen, Okko; Kakouros, Sofoklis; Soderstrom, MelanieThe exaggerated intonation and special rhythmic properties of infant-directed speech (IDS) have been hypothesized to attract infants’ attention to the speech stream. However, there has been little work actually connecting the properties of IDS to models of attentional processing or perceptual learning. A number of such attention models suggest that surprising or novel perceptual inputs attract attention, where novelty can be operationalized as the statistical (un)predictability of the stimulus in the given context. Since prosodic patterns such as F0 contours are accessible to young infants who are also known to be adept statistical learners, the present paper investigates a hypothesis that F0 contours in IDS are less predictable than those in adult-directed speech (ADS), given previous exposure to both speaking styles, thereby potentially tapping into basic attentional mechanisms of the listeners in a similar manner that relative probabilities of other linguistic patterns are known to modulate attentional processing in infants and adults. Computational modeling analyses with naturalistic IDS and ADS speech from matched speakers and contexts show that IDS intonation has lower overall temporal predictability even when the F0 contours of both speaking styles are normalized to have equal means and variances. A closer analysis reveals that there is a tendency of IDS intonation to be less predictable at the end of short utterances, whereas ADS exhibits more stable average predictability patterns across the full extent of the utterances. The difference between IDS and ADS persists even when the proportion of IDS and ADS exposure is varied substantially, simulating different relative amounts of IDS heard in different family and cultural environments. Exposure to IDS is also found to be more efficient for predicting ADS intonation contours in new utterances than exposure to the equal amount of ADS speech. This indicates that the more variable prosodic contours of IDS also generalize to ADS, and may therefore enhance prosodic learning in infancy. Overall, the study suggests that one reason behind infant preference for IDS could be its higher information value at the prosodic level, as measured by the amount of surprisal in the F0 contours. This provides the first formal link between the properties of IDS and the models of attentional processing and statistical learning in the brain. However, this finding does not rule out the possibility that other differences between the IDS and ADS also play a role. - Sentence Accent Perception in Noise by French Non-Native Listeners of English
A4 Artikkeli konferenssijulkaisussa(2018) Scharenborg, Odette; Meunier, Fanny; Kakouros, Sofoklis; Post, BrechtjeThis paper investigates the use of prosodic information signalling sentence accent and the role of different acoustic features on sentence accent perception during native and non-native speech perception in the presence of background noise. A phoneme detection experiment was carried out in which English native listeners and French highly proficient non-native listeners of English were presented with target phonemes in English sentences. Sentences were presented in different levels of speech-shaped noise and in two prosodic contexts in which the target-bearing word was either deaccented or accented. Acoustic analyses of the two prosodic conditions showed that the target-bearing words in the accented condition carried more energy, had a higher F0, and more spectral tilt than those in the deaccented condition. Results of the behavioural data showed that the native listeners outperformed the French listeners in the clean condition but not in the noise conditions and that the effect of noise was smaller for the non-native compared to the native listeners. Possibly, the non-native listeners use more and different acoustic cues than the native listeners who primarily relied on more local cues for sentence accent detection.