Browsing by Author "Laaksonen, Laura"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
- Artificial bandwidth extension of narrowband speech - enhanced speech quality and intelligibility in mobile devices
School of Electrical Engineering | Doctoral dissertation (article-based)(2013) Laaksonen, LauraEven today, most of the telephone users are offered only narrowband speech transmission. The limited frequency band from 300 Hz to 3400 Hz reduces both quality and intelligibility of speech due to the missing high frequency components that are important cues especially in consonant sounds. Particularly in mobile communications that often takes place in noisy environments, degraded speech intelligibility results in listener fatigue and difficulty in speaker recognition. The deployment of wideband (50–7000 Hz), and superwideband (50–140000 Hz) speech transmission is ongoing, but the current narrowband speech coding will coexist with the new technologies still for years. In this thesis, a speech enhancement method called artificial bandwidth extension (ABE) for narrowband speech is studied. ABE methods aim to improve quality and intelligibility of narrowband speech by regenerating the missing high frequency content in the speech signal, typically in the frequency range 4 kHz–8 kHz. Since the enhanced speech quality is achieved without any transmitted information, the algorithm can be implemented at the receiving end of a communication link, for example in a mobile device after decoding the speech signal. This thesis presents algorithms for artificially extending the speech bandwidth. The methods are primarily designed for monaural speech signals, but also the extension of binaural speech signals is addressed. The algorithms are developed such that they incur reasonable computational costs, memory consumption, and algorithmic delays for mobile communications. These and other implementational issues related to mobile devices are addressed here. The performance of the methods has been evaluated by several subjective tests, including listening-opinion tests in several languages, intelligibility tests, and conversational tests. The evaluations have been mostly carried out with coded speech to provide realistic results. The results from the subjective evaluations of the methods show that artificial bandwidth extension can improve quality and intelligibility of narrowband speech signals in mobile communications. Further evidence of the reliability of the methods has been obtained by successful product implementations. - Automatic classification of vocal intensity category from speech
A4 Artikkeli konferenssijulkaisussa(2023) Kodali, Manila; Kadiri, Sudarsana; Laaksonen, Laura; Alku, PaavoRegulation of vocal intensity is a fundamental phenomenon in speech communication. Vocal intensity can be quantified using sound pressure level (SPL), which can be measured easily by recording a standard calibration signal with speech and by comparing the energy of the recorded speech signal with that of the calibration tone. Unfortunately, speech recordings are mostly conducted without the SPL calibration signal, and speech signals are saved to databases using arbitrary amplitude scales. Therefore, neither the SPL nor the intensity category (e.g. soft or loud phonation) of a saved speech signal can be determined afterwards. Even though the original level information of speech is lost when the signal is presented on arbitrary amplitude scales, the speech signal contains other acoustic cues of vocal intensity. In the current study, we study machine learning and deep learning -based methods in automatic classification of vocal intensity category when the input speech is expressed using an arbitrary amplitude scale. A new gender-balanced database consisting of speech produced in four vocal intensity categories (soft, normal, loud, and very loud) was first recorded. Support vector machine and deep neural network (DNN) models were used to develop automatic classification systems using spectrograms, mel-spectrograms, and mel-frequency cepstral coefficients as features. The DNN classifier using the mel-spectrogram showed the best classification accuracy of about 90%. The database is made publicly available at https://bit.ly/3tLPGRx - AVID: A speech database for machine learning studies on vocal intensity
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2024-02) Alku, Paavo; Kodali, Manila; Laaksonen, Laura; Kadiri, SudarsanaVocal intensity, which is quantified typically with the sound pressure level (SPL), is a key feature of speech. To measure SPL from speech recordings, a standard calibration tone (with a reference SPL of 94 dB or 114 dB) needs to be recorded together with speech. However, most of the popular databases that are used in areas such as speech and speaker recognition have been recorded without calibration information by expressing speech on arbitrary amplitude scales. Therefore, information about vocal intensity of the recorded speech, including SPL, is lost. In the current study, we introduce a new open and calibrated speech/electroglottography (EGG) database named Aalto Vocal Intensity Database (AVID). AVID includes speech and EGG produced by 50 speakers (25 males, 25 females) who varied their vocal intensity in four categories (soft, normal, loud and very loud). Recordings were conducted using a constant mouth-to-microphone distance and by recording a calibration tone. The speech data was labelled sentence-wise using a total of 19 labels that support the utilisation of the data in machine learing (ML) -based studies of vocal intensity based on supervised learning. In order to demonstrate how the AVID data can be used to study vocal intensity, we investigated one multi-class classification task (classification of speech into soft, normal, loud and very loud intensity classes) and one regression task (prediction of SPL of speech). In both tasks, we deliberately warped the level of the input speech by normalising the signal to have its maximum amplitude equal to 1.0, that is, we simulated a scenario that is prevalent in current speech databases. The results show that using the spectrogram feature with the support vector machine classifier gave an accuracy of 82% in the multi-class classification of the vocal intensity category. In the prediction of SPL, using the spectrogram feature with the support vector regressor gave an mean absolute error of about 2 dB and a coefficient of determination of 92%. We welcome researchers interested in classification and regression problems to utilise AVID in the study of vocal intensity, and we hope that the current results could serve as baselines for future ML studies on the topic. - Influencing factors in selecting interior lining products in Russia
Helsinki University of Technology | Master's thesis(2007) Laaksonen, LauraAsuntorakentaminen on kasvanut Venäjälle vuosittain 10-15 prosenttia. 25 prosenttia kaikesta asuntorakentamisesta on keskittynyt Moskovaan ja sitä ympäröivälle alueelle. Tässä työssä tutkitaan korkealuokkaisten sisäverhousmateriaalien markkinoita Moskovassa ja Pietarissa. Tämän tutkimuksen tavoitteena oli selvittää relevantit sisäverhousmateriaalien valintaprosessiin liittyvät markkinoinnin ja tiedotuksen kanavat. Kohderyhmänä olivat hyvätuloiset venäläiset kotitaloudet, joilla on kiinnostusta sisustamiseen ja designkalusteisiin. Työn toimeksiantaja on UPM-Kymmene Wood Products Division. Tutkimus perustuu Moskovassa helmikuussa 2007 tehtyihin haastatteluihin. Teoriakatsauksessa käydään läpi perusteita uusille markkinoille menemisestä ja uusista arvoinnovaatioista. Markkinakatsauksessa tarkastellaan Venäjän markkinoita yleisellä tasolla, sekä yksityiskohtaisemmin rakennusalaa ja kuluttajamarkkinoita ja niiden kehitystä. Tulokset ovat konkreettisia suosituksia toimeksiantajayritykselle liittyen korkealuokkaisten sisustustuotteiden markkinointiin Venäjällä.