Object-based modelling for representing and processing speech corpora

No Thumbnail Available

URL

Journal Title

Journal ISSN

Volume Title

Doctoral thesis (monograph)
Checking the digitized thesis and permission for publishing
Instructions for the author

Date

2001-09-28

Major/Subject

Mcode

Degree programme

Language

en

Pages

92

Series

Report / Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing, Raportti / Teknillinen korkeakoulu, akustiikan ja äänenkäsittelytekniikan laboratorio, 63

Abstract

This thesis deals with modelling data existing in large speech corpora using an object-oriented paradigm which captures important linguistic structures. Information from corpora is transformed into objects and are assigned properties regarding their behaviour. These objects, called speech units, are placed onto a multi-dimensional framework and have their relationships to other units explicitly defined through the use of links. Frameworks that model temporal utterances or atemporal information like speaker characteristics and recording conditions can be searched efficiently for contextual matches. Speech units that match desired contexts are the result of successful linguistically motivated queries and can be used in further speech processing tasks in the same computational environment. This allows for empirical studies of speech and its relation to linguistic structures to be carried out, and for the training and testing of applications like speech recognition and synthesis. Information residing in typical speech corpora is discussed first, followed by an overview of object-orientation which sets the tone for this thesis. Then the representation framework is introduced which is generated by a compiler and linker that rely on a set of domain-specific resources that transform corpus data into speech units. Operations on this framework are then presented along with a comparison between a relational and object-oriented model of identical speech data. The models described in this work are directly applicable to existing large speech corpora, and the methods developed here are tested against relational database methods. The object-oriented methods outperform the relational methods for typical linguistically relevant queries by about three orders of magnitude as measured by database search times. This improvement in simplicity of representation and search speed is crucial for the utilisation of large multi-lingual corpora in basic research on the detailed properties of speech, especially in relation to contextual variation.

Description

Keywords

speech corpora, speech database, object-oriented model, database access, speech processing

Other note

Citation

Permanent link to this item

https://urn.fi/urn:nbn:fi:tkk-002940