[diss] Perustieteiden korkeakoulu / SCI
Permanent URI for this collection
Browse
Browsing [diss] Perustieteiden korkeakoulu / SCI by Department "Department of Computer Science and Engineering"
Now showing 1 - 20 of 136
Results Per Page
Sort Options
Item Accessing multiversion data in database transactions(Aalto-yliopiston teknillinen korkeakoulu, 2010) Haapasalo, Tuukka; Sippu, Seppo, Prof.; Tietotekniikan laitos; Department of Computer Science and Engineering; Aalto-yliopiston teknillinen korkeakoulu; Soisalon-Soininen, Eljas, Prof.Many important database applications need to access previous versions of the data set, thus requiring that the data are stored in a multiversion database and indexed with a multiversion index, such as the multiversion B+-tree (MVBT) of Becker et al. The MVBT is optimal, so that any version of the database can be accessed as efficiently as with a single-version B+-tree that is used to index only the data items of that version, but it cannot be used in a full-fledged database system because it follows a single-update model, and the update cannot be rolled back. We have redesigned the MVBT index so that a single multi-action updating transaction can operate on the index structure concurrently with multiple concurrent read-only transactions. Data items created by the transaction become part of the same version, and the transaction can roll back. We call this structure the transactional MVBT (TMVBT). The TMVBT index remains optimal even in the presence of logical key deletions. Even though deletions in a multiversion index must not physically delete the history of the data items, queries and range scans can become more efficient, if the leaf pages of the index structure are merged to retain optimality. For the general transactional setting with multiple updating transactions, we propose a multiversion database structure called the concurrent MVBT (CMVBT), which stores the updates of active transactions in a separate main-memory-resident versioned B+-tree index. A system maintenance transaction is periodically run to apply the updates of committed transactions into the TMVBT index. We show how multiple updating transactions can operate on the CMVBT index concurrently, and our recovery algorithm is based on the standard ARIES recovery algorithm. We prove that the TMVBT index is asymptotically optimal, and show that the performance of the CMVBT index in general transaction processing is on par with the performance of the time-split B+-tree (TSB-tree) of Lomet and Salzberg. The TSB-tree does not merge leaf pages and is therefore not optimal if logical data-item deletions are allowed. Our experiments show that the CMVBT outperforms the TSB-tree with range queries in the presence of deletions.Item Adaptive combinations of classifiers with application to on-line handwritten character recognition(Helsinki University of Technology, 2007-03-29) Aksela, Matti; Department of Computer Science and Engineering; Tietotekniikan osasto; Laboratory of Computer and Information Science; Informaatiotekniikan laboratorioClassifier combining is an effective way of improving classification performance. User adaptation is clearly another valid approach for improving performance in a user-dependent system, and even though adaptation is usually performed on the classifier level, also adaptive committees can be very effective. Adaptive committees have the distinct ability of performing adaptation without detailed knowledge of the classifiers. Adaptation can therefore be used even with classification systems that intrinsically are not suited for adaptation, whether that be due to lack of access to the workings of the classifier or simply a classification scheme not suitable for continuous learning. This thesis proposes methods for adaptive combination of classifiers in the setting of on-line handwritten character recognition. The focal part of the work introduces adaptive classifier combination schemes, of which the two most prominent ones are the Dynamically Expanding Context (DEC) committee and the Class-Confidence Critic Combining (CCCC) committee. Both have been shown to be capable of successful adaptation to the user in the task of on-line handwritten character recognition. Particularly the highly modular CCCC framework has shown impressive performance also in a doubly-adaptive setting of combining adaptive classifiers by using an adaptive committee. In support of this main topic of the thesis, some discussion on a methodology for deducing correct character labeling from user actions is presented. Proper labeling is paramount for effective adaptation, and deducing the labels from the user's actions is necessary to perform adaptation transparently to the user. In that way, the user does not need to give explicit feedback on the correctness of the recognition results. Also, an overview is presented of adaptive classification methods for single-classifier adaptation in handwritten character recognition developed at the Laboratory of Computer and Information Science of the Helsinki University of Technology, CIS-HCR. Classifiers based on the CIS-HCR system have been used in the adaptive committee experiments as both member classifiers and to provide a reference level. Finally, two distinct approaches for improving the performance of committee classifiers further are discussed. Firstly, methods for committee rejection are presented and evaluated. Secondly, measures of classifier diversity for classifier selection, based on the concept of diversity of errors, are presented and evaluated. The topic of this thesis hence covers three important aspects of pattern recognition: on-line adaptation, combining classifiers, and a practical evaluation setting of handwritten character recognition. A novel approach combining these three core ideas has been developed and is presented in the introductory text and the included publications. To reiterate, the main contributions of this thesis are: 1) introduction of novel adaptive committee classification methods, 2) introduction of novel methods for measuring classifier diversity, 3) presentation of some methods for implementing committee rejection, 4) discussion and introduction of a method for effective label deduction from on-line user actions, and as a side-product, 5) an overview of the CIS-HCR adaptive on-line handwritten character recognition system.Item Adaptive methods for on-line recognition of isolated handwritten characters(Helsinki University of Technology, 2002-12-14) Vuori, Vuokko; Department of Computer Science and Engineering; Tietotekniikan osasto; Laboratory of Computer and Information Science; Informaatiotekniikan laboratorioThe main goal of the work presented in this thesis has been the development of an on-line handwriting recognition system which is able to recognize handwritten characters of several different writing styles and is able to improve its performance by adapting itself to new writing styles. The recognition method should be applicable to hand-held devices of limited memory and computational resources. The adaptation process should take place during normal use of the device, not in some specific training mode. For the usability aspect of the recognition system, the recognition and adaptation processes should be easily understandable to the users. The first part of this thesis gives an introduction to the handwriting recognition. The topics considered include: the variations present in personal handwriting styles; automatic grouping of similar handwriting styles; the differences between writer-independent and writer-dependent as well as on-line and off-line handwriting recognition problems; the different approaches to on-line handwriting recognition; the previous adaptive recognition systems and the experiments performed with them; the recognition performance requirements and other usability issues related to on-line handwriting recognition; the current trends in on-line handwriting recognition research; the recognition results obtained with the most recent recognition systems; and the commercial applications. The second part of the thesis describes an adaptive on-line character recognition system and the experiments performed with it. The recognition system is based on prototype matching. The comparisons between the character samples and prototypes are based on the Dynamical Time Warping (DTW) algorithm and the input characters are classified according to the k Nearest Neighbors (k-NN) rule. The initial prototype set is formed by clustering character samples collected from a large number of subjects. Thus, the recognition system can handle various writing styles. This thesis work introduces four DTW-based clustering algorithms which can be used for the prototype selection. The recognition system adapts to new writing styles by modifying its prototype set. This work introduces several adaptation strategies which add new writer-dependent prototypes into the initial writer-independent prototype set, reshape the existing prototypes with a Learning Vector Quantization (LVQ)-based algorithm, and inactivate poorly performing prototypes. The adaptations are carried out on-line in a supervised or self-supervised fashion. In the former case, the user explicitly labels the input characters which are used as training samples in the adaptation process. In the latter case, the system deduces the labels from the recognition results and the user's actions. The latter approach is prone to erroneously labeled learning samples. The different adaptation strategies were experimented with and compared with each other by performing off-line simulations and genuine on-line user experiments. In the simulations, special attention has been paid to the various erroneous learning situations likely to be encountered in real world handwriting recognition tasks. The recognition system is able to improve its recognition accuracy significantly on the basis of only a few additional character samples per class. Recognition accuracies acceptable in real world applications can be attained for most of the test subjects. This work also introduces a Self-Organizing Map (SOM)-based method for analyzing personal writing styles. Personal writing styles are represented by high-dimensional vectors, the components of which indicate the subjects' tendencies to use certain prototypical writing styles for isolated characters. These writing style vectors are then visualized by a SOM which enables the detection and analysis of clusters of similar writing styles.Item Adaptive probabilistic roadmap construction with multi-heuristic local planning(Helsinki University of Technology, 2003-09-26) Isto, Pekka; Department of Computer Science and Engineering; Tietotekniikan osasto; Industrial IT Laboratory; Teollisuuden tietotekniikan laboratorioThe motion planning problem means the computation of a collision-free motion for a movable object among obstacles from the given initial placement to the given end placement. Efficient motion planning methods have many applications in many fields, such as robotics, computer aided design, and pharmacology. The problem is known to be PSPACE-hard. Because of the computational complexity, practical applications often use heuristic or incomplete algorithms. Probabilistic roadmap is a probabilistically complete motion planning method that has been an object of intensive study over the past years. The method is known to be susceptible to the problem of "narrow passages": Finding a motion that passes a narrow, winding tunnel can be very expensive. This thesis presents a probabilistic roadmap method that addresses the narrow passage problem with a local planner based on heuristic search. The algorithm is suitable for planning motions for rigid bodies and articulated robots including multi-robot systems with many degrees-of-freedom. Variants of the algorithm are described for single query planning, single query planning on a distributed memory parallel computer, and a preprocessing type of learning algorithm for multiple query planning. An empirical study of the effect of balance between local and global planning reveals that no universal optimal balance is likely to exist. Furthermore, it appears that many traditional simple local planners are too weak for efficient solving of problems with a narrow passage. The empirical results show that local planners based on backtracking search are more efficient than the more traditional local planners when a motion through a narrow passage is to be planned. The parallel variant has acceptable scalability on a parallel computer built from commodity components. It is also observed that run-time adjustment of the parameters of the search can reduce the variance of the run-cost. The run-cost variance is a known, but little studied deficiency of randomized motion planning methods. It is suggested that the future research in randomized motion planning algorithms should address run-cost variance as an important performance characteristic of the algorithm. The results are obtained with empirical methods and established procedures from design and analysis of experiments. The algorithms are assessed with a number of test problems including known benchmark problems from the literature.Item Advanced source separation methods with applications to spatio-temporal datasets(Helsinki University of Technology, 2006-11-03) Ilin, Alexander; Department of Computer Science and Engineering; Tietotekniikan osasto; Laboratory of Computer and Information Science; Informaatiotekniikan laboratorioLatent variable models are useful tools for statistical data analysis in many applications. Examples of popular models include factor analysis, state-space models and independent component analysis. These types of models can be used for solving the source separation problem in which the latent variables should have a meaningful interpretation and represent the actual sources generating data. Source separation methods is the main focus of this work. Bayesian statistical theory provides a principled way to learn latent variable models and therefore to solve the source separation problem. The first part of this work studies variational Bayesian methods and their application to different latent variable models. The properties of variational Bayesian methods are investigated both theoretically and experimentally using linear source separation models. A new nonlinear factor analysis model which restricts the generative mapping to the practically important case of post-nonlinear mixtures is presented. The variational Bayesian approach to learning nonlinear state-space models is studied as well. This method is applied to the practical problem of detecting changes in the dynamics of complex nonlinear processes. The main drawback of Bayesian methods is their high computational burden. This complicates their use for exploratory data analysis in which observed data regularities often suggest what kind of models could be tried. Therefore, the second part of this work proposes several faster source separation algorithms implemented in a common algorithmic framework. The proposed approaches separate the sources by analyzing their spectral contents, decoupling their dynamic models or by optimizing their prominent variance structures. These algorithms are applied to spatio-temporal datasets containing global climate measurements from a long period of time.Item Advances in assessment of programming skills(Aalto University, 2012) Seppälä, Otto; Korhonen, Ari, D.Sc (Tech); Tietotekniikan laitos; Department of Computer Science and Engineering; Perustieteiden korkeakoulu; School of Science; Malmi, Lauri, ProfessorThis thesis concerns assessment techniques used in university level programming education. The motivation is in improving existing assessment methods to yield more detailed or fundamentally different kinds of information which can be used to provide higher quality feedback to the student. One central theme in this thesis is the use of program reading and tracing skills in different aspects of programming. Tracing is a critical skill in reading and writing code and debugging. Simple tracing exercises can be used to test understanding of programming language constructs and program execution. We present results from an international study of student competence in tracing program code in the end of their first programming course. The results highlight that while students are expected to have elementary skills in program construction, some of them lack knowledge of execution of programs of the same difficulty. The effect of students' annotations and solving strategies on tracing performance was analyzed further. Tracing exercises can also be used to test understanding of data structures and algorithms. Visual algorithm simulation is a method in which a student manipulates data structure visualizations with a mouse, trying to simulate the steps of a given algorithm. A successful simulation is evidence of understanding the core concepts of that algorithm. Automatic assessment and delivery of visual algorithm simulation problems is implemented in the tool TRAKLA2. In this thesis we present a technique that improves TRAKLA2's assessment, trying to interpret errors in student simulations using information on known misconceptions and by simulating careless errors. Another topic studied in this thesis is whether mutation testing can be applied to evaluating the adequacy of software tests made by students. In mutation testing the effectiveness of a test suite in discovering errors is evaluated by seeding errors into the program under test. A well constructed test suite should find most such errors quite easily. Code coverage analysis, the method used in available assessment platforms, can yield results that give the students false understanding of the quality of their testing. Feedback from programming exercises assessed by unit tests has traditionally been in the form of text. Explaining more complicated object hierarchies textually can be complicated to understand. The last topic covered is using visualization to convey information either in a domain specific visual format or in a form of a generic visualization. An empirical study of student performance comparing the two types of input against detailed textual feedback is presented as a part of the thesis.Item Advances in independent component analysis with applications to data mining(Helsinki University of Technology, 2003-12-12) Bingham, Ella; Department of Computer Science and Engineering; Tietotekniikan osasto; Laboratory of Computer and Information Science; Informaatiotekniikan laboratorioThis thesis considers the problem of finding latent structure in high dimensional data. It is assumed that the observed data are generated by unknown latent variables and their interactions. The task is to find these latent variables and the way they interact, given the observed data only. It is assumed that the latent variables do not depend on each other but act independently. A popular method for solving the above problem is independent component analysis (ICA). It is a statistical method for expressing a set of multidimensional observations as a combination of unknown latent variables that are statistically independent of each other. Starting from ICA, several methods of estimating the latent structure in different problem settings are derived and presented in this thesis. An ICA algorithm for analyzing complex valued signals is given; a way of using ICA in the context of regression is discussed; and an ICA-type algorithm is used for analyzing the topics in dynamically changing text data. In addition to ICA-type methods, two algorithms are given for estimating the latent structure in binary valued data. Experimental results are given on all of the presented methods. Another, partially overlapping problem considered in this thesis is dimensionality reduction. Empirical validation is given on a computationally simple method called random projection: it does not introduce severe distortions in the data. It is also proposed that random projection could be used as a preprocessing method prior to ICA, and experimental results are shown to support this claim. This thesis also contains several literature surveys on various aspects of finding the latent structure in high dimensional data.Item Advances in Methods of Anomaly Detection and Visualization of Multivariate Data(Aalto University, 2015) Talonen, Jaakko; Sirola, Miki, Docent, Aalto University, Department of Information and Computer Science, Finland; Sulkava, Mika, Dr., MTT Agrifood Research, Finland; Tietotekniikan laitos; Department of Computer Science and Engineering; Environmental and Industrial Machine Learning; Perustieteiden korkeakoulu; School of Science; Simula, Olli, Prof., Aalto University, Department of Information and Computer Science, FinlandSuccessful machine learning applications have been developed in almost all fields where measurable data exists. For example, computers can learn the best treatment for a particular disease from medical records and self-customized programs can recommend different products for customers. In this thesis, statistical and machine learning methods have been applied in both time series and static multivariate data sets, which have unknown and potentially useful information. Data can be understood better by developing new methods because a large number of data samples and variables makes it difficult to interpret the research materials. The research material for the development of anomaly detection methods and presenting the results consisted of process signal data from Olkiluoto nuclear power plant, the results of the Parliamentary elections and the answers of the voting advice application, and aggregated car inspection data. The process state changes can be detected by the procedures and the visualization techniques developed in this research. These potential anomalies should be detected as soon as possible and in an early stage using the signal measurements. Challenges related to stochastic processes have been solved using recursive models and neural networks. The results related to the static multivariate data demonstrate that the combination of principal component analysis and probability distributions makes it possible to estimate missing values and understand the dependencies of the observations. A significantly larger number of missing data can be estimated by the recommender system and thus the resulting complete data can be explored by other machine learning methods e.g. by a self-organizing map. These methods make it possible to analyze the missing value dependencies of the multivariate data sets and thus improve the detection of anomaly observations. Applying the machine learning methods discussed in this thesis; dramatically increasing information can be utilized more effectively. Data can be modified into an understandable form, detect existing anomalies in it and thus used as decision support regardless of the research area.Item Advances in variable selection and visualization methods for analysis of multivariate data(Helsinki University of Technology, 2007-10-19) Similä, Timo; Department of Computer Science and Engineering; Tietotekniikan osasto; Laboratory of Computer and Information Science; Informaatiotekniikan laboratorioThis thesis concerns the analysis of multivariate data. The amount of data that is obtained from various sources and stored in digital media is growing at an exponential rate. The data sets tend to be too large in terms of the number of variables and the number of observations to be analyzed by hand. In order to facilitate the task, the data set must be summarized somehow. This work introduces machine learning methods that are capable of finding interesting patterns automatically from the data. The findings can be further used in decision making and prediction. The results of this thesis can be divided into three groups. The first group of results is related to the problem of selecting a subset of input variables in order to build an accurate predictive model for several response variables simultaneously. Variable selection is a difficult combinatorial problem in essence, but the relaxations examined in this work transform it into a more tractable optimization problem of continuous-valued parameters. The main contribution here is extending several methods that are originally designed for a single response variable to be applicable with multiple response variables as well. Examples of such methods include the well known lasso estimate and the least angle regression algorithm. The second group of results concerns unsupervised variable selection, where all variables are treated equally without making any difference between responses and inputs. The task is to detect the variables that contain, in some sense, as much information as possible. A related problem that is also examined is combining the two major categories of dimensionality reduction: variable selection and subspace projection. Simple modifications of the multiresponse regression techniques developed in this thesis offer a fresh approach to these unsupervised learning tasks. This is another contribution of the thesis. The third group of results concerns extensions and applications of the self-organizing map (SOM). The SOM is a prominent tool in the initial exploratory phase of multivariate analysis. It provides a clustering and a visual low-dimensional representation of a set of high-dimensional observations. Firstly, an extension of the SOM algorithm is proposed in this thesis, which is applicable to strongly curvilinear but intrinsically low-dimensional data structures. Secondly, an application of the SOM is proposed to interpret nonlinear quantile regression models. Thirdly, a SOM-based method is introduced for analyzing the dependency of one multivariate data set on another.Item Advances in variational Bayesian nonlinear blind source separation(Helsinki University of Technology, 2005-05-13) Honkela, Antti; Department of Computer Science and Engineering; Tietotekniikan osasto; Laboratory of Computer and Information Science; Informaatiotekniikan laboratorioLinear data analysis methods such as factor analysis (FA), independent component analysis (ICA) and blind source separation (BSS) as well as state-space models such as the Kalman filter model are used in a wide range of applications. In many of these, linearity is just a convenient approximation while the underlying effect is nonlinear. It would therefore be more appropriate to use nonlinear methods. In this work, nonlinear generalisations of FA and ICA/BSS are presented. The methods are based on a generative model, with a multilayer perceptron (MLP) network to model the nonlinearity from the latent variables to the observations. The model is estimated using variational Bayesian learning. The variational Bayesian method is well-suited for the nonlinear data analysis problems. The approach is also theoretically interesting, as essentially the same method is used in several different fields and can be derived from several different starting points, including statistical physics, information theory, Bayesian statistics, and information geometry. These complementary views can provide benefits for interpretation of the operation of the learning method and its results. Much of the work presented in this thesis consists of improvements that make the nonlinear factor analysis and blind source separation methods faster and more stable, while being applicable to other learning problems as well. The improvements include methods to accelerate convergence of alternating optimisation algorithms such as the EM algorithm and an improved approximation of the moments of a nonlinear transform of a multivariate probability distribution. These improvements can be easily applied to other models besides FA and ICA/BSS, such as nonlinear state-space models. A specialised version of the nonlinear factor analysis method for post-nonlinear mixtures is presented as well.Item Algorithms for classification of combinatorial objects(Helsinki University of Technology, 2005-06-15) Kaski, Petteri; Department of Computer Science and Engineering; Tietotekniikan osasto; Laboratory for Theoretical Computer Science; Tietojenkäsittelyteorian laboratorioA recurrently occurring problem in combinatorics is the need to completely characterize a finite set of finite objects implicitly defined by a set of constraints. For example, one could ask for a list of all possible ways to schedule a football tournament for twelve teams: every team is to play against every other team during an eleven-round tournament, such that every team plays exactly one game in every round. Such a characterization is called a classification for the objects of interest. Classification is typically conducted up to a notion of structural equivalence (isomorphism) between the objects. For example, one can view two tournament schedules as having the same structure if one can be obtained from the other by renaming the teams and reordering the rounds. This thesis examines algorithms for classification of combinatorial objects up to isomorphism. The thesis consists of five articles – each devoted to a specific family of objects – together with a summary surveying related research and emphasizing the underlying common concepts and techniques, such as backtrack search, isomorphism (viewed through group actions), symmetry, isomorph rejection, and computing isomorphism. From an algorithmic viewpoint the focus of the thesis is practical, with interest on algorithms that perform well in practice and yield new classification results; theoretical properties such as the asymptotic resource usage of the algorithms are not considered. The main result of this thesis is a classification of the Steiner triple systems of order 19. The other results obtained include the nonexistence of a resolvable 2-(15, 5, 4) design, a classification of the one-factorizations of k-regular graphs of order 12 for k ≤ 6 and k = 10, 11, a classification of the near-resolutions of 2-(13, 4, 3) designs together with the associated thirteen-player whist tournaments, and a classification of the Steiner triple systems of order 21 with a nontrivial automorphism group.Item Algorithms for nonuniform networks(Helsinki University of Technology, 2006-04-28) Schaeffer, Satu Elisa; Department of Computer Science and Engineering; Tietotekniikan osasto; Laboratory for Theoretical Computer Science; Tietojenkäsittelyteorian laboratorioIn this thesis, observations on structural properties of natural networks are taken as a starting point for developing efficient algorithms for natural instances of different graph problems. The key areas discussed are sampling, clustering, routing, and pattern mining for large, nonuniform graphs. The results include observations on structural effects together with algorithms that aim to reveal structural properties or exploit their presence in solving an interesting graph problem. Traditionally networks were modeled with uniform random graphs, assuming that each vertex was equally important and each edge equally likely to be present. Within the last decade, the approach has drastically changed due to the numerous observations on structural complexity in natural networks, many of which proved the uniform model to be inadequate for some contexts. This quickly lead to various models and measures that aim to characterize topological properties of different kinds of real-world networks also beyond the uniform networks. The goal of this thesis is to utilize such observations in algorithm design, in addition to empowering the process of network analysis. Knowing that a graph exhibits certain characteristics allows for more efficient storage, processing, analysis, and feature extraction. Our emphasis is on local methods that avoid resorting to information of the graph structure that is not relevant to the answer sought. For example, when seeking for the cluster of a single vertex, we compute it without using any global knowledge of the graph, iteratively examining the vicinity of the seed vertex. Similarly we propose methods for sampling and spanning-tree construction according to certain criteria on the outcome without requiring knowledge of the graph as a whole. Our motivation for concentrating on local methods is two-fold: one driving factor is the ever-increasing size of real-world problems, but an equally important fact is the nonuniformity present in many natural graph instances; properties that hold for the entire graph are often lost when only a small subgraph is examined.Item Algorithms for XML filtering(Aalto University, 2011) Silvasti, Panu; Sippu, Seppo; Soisalon-Soininen, Eljas, Prof.; Tietotekniikan laitos; Department of Computer Science and Engineering; Perustieteiden korkeakouluIn a publish-subscribe system based on XML filtering, the subscriber profiles are usually specified by filters written in the XPath language. The system processes the stream of XML documents and delivers to subscribers a notification or the content of those documents that match the filters. The number of interested subscribers and their stored profiles can be very large, thousands or even millions. In this case, the scalability of the system is critical. In this thesis, we develop several algorithms for XML filtering with linear XPath expressions. The algorithms are based on a backtracking Aho-Corasick pattern-matching automaton (PMA) built from "keywords" extracted from the filters, where a keyword is a maximal substring consisting only of XML element names. The output function of the PMA indicates which keyword occurrences of which filter are recognized at a given state. Our best results have been obtained by using a dynamically changing output function, which is dynamically updated during the processing of the input document. We have conducted an extensive performance study in which we compared our filtering algorithms with YFilter and the lazy DFA, two well-known automata-based filtering methods. With a non-recursive XML data set, PMA-based filtering is tens of times more efficient than YFilter and also significantly more efficient than the lazy DFA. With a slightly recursive data set PMA-based filtering has the same performance as the lazy DFA and it is significantly more efficient than YFilter. We have also developed an optimization method called filter pruning. This method improves the performance of filtering by utilizing knowledge about the XML document type definition (DTD) to simplify the filters. The optimization algorithm takes as input a DTD and a set of linear XPath filters and produces a set of pruned linear XPath filters that contain as few wildcards and descendant operators as possible. With a non-recursive data set and with a slightly recursive data set the filter-pruning method yielded a tenfold increase in the filtering speed of the PMA-based algorithms and a hundredfold increase with YFilter and the lazy DFA. Filter pruning can also increase the filtering speed in the case of highly recursive data sets.Item The application of dynamic virtual prototyping to the development of control systems(Helsinki University of Technology, 2006-06-14) Krassi, Boris; Department of Computer Science and Engineering; Tietotekniikan osasto; Industrial Information Technology Laboratory; Teollisuuden tietotekniikan laboratorioA new method of developing control systems on the basis of dynamic virtual prototyping (DVP) has been proposed. The method facilitates the control system development process by means of (1) automating the transition from the conventional DVP of a plant to the dynamic model suitable for the control system design and (2) integrating the development process into the overall lifecycle. The method comprises the three principal stages: (1) representing the plant by its DVP; (2) simulating the DVP and generating the data-based model of the plant; (3) designing the control system using the generated data-based model. Stages 1 and 2 are supported by DVP systems (e.g. IGRIP, LMS/VirtualLab, MSC SimOffice), stage 3 is accomplished by CACSD. The proposed development method has been adapted to the class of plants that are linearizable, quasi-stationary, stable or stabilizable without using the analytical model and have lumped parameters. The specifics of applying the conventional methods of identification and control system design in the context of DVP have been revealed and analyzed. The engineering applicability of the method has been proved by means of developing the control system for fine positioning of a gantry crane load.Item Application of spatial sound reproduction in virtual environments : experiments in localization, navigation, and orientation(Helsinki University of Technology, 2006-05-24) Gröhn, Matti; Department of Computer Science and Engineering; Tietotekniikan osasto; Telecommunications Software and Multimedia Laboratory; Tietoliikenneohjelmistojen ja multimedian laboratorioThe topic of this research was spatial sound reproduction in a cave-like virtual room (EVE) of the Helsinki University of Technology. Spatial sound reproduction is widely used, for example, in movie industry and computer games. In virtual environments it has been employed less than visual and tactile modalities. There are several common tasks in virtual reality applications in which spatial audio could be used. This thesis concentrates on localization, navigation, and orientation. This research is one of the first studies in localization accuracy of loudspeaker reproduction in a virtual room. In the localization experiments, subjects pointed to the perceived direction of the sound source. According to the measurements, the achieved localization accuracy was at the same level as presented in literature for headphone reproduction. Localization of the moving sound sources was not as accurate as localization of the static sources. In the navigation experiments, the task of the users was to move from waypoint to waypoint according to the visual and auditory cues. In the first experiment, auditory, visual, and audio-visual conditions were tested, and in the second experiment, different auditory cues were compared. Audio-visual navigation was the most efficient. Analysis of the travel paths indicated that an auditory cue was used at the beginning to locate direction of the next target, and a visual cue was used in the final approach to the target. In addition, all the subjects could navigate using the auditory cue alone. Auditory navigation performance increased when additional information about the distance and elevation of the target was included in auditory cues. In the orientation experiment, subjects flew a predefined route inside an architectural model. Their task was to keep the model as balanced as possible during their flight. Three auditory artificial horizons were designed using "ball on a plate" metaphor. The sound was played from the direction towards which the virtual world was tilted. According to test results, the designed horizons helped the user to keep the model better in an upright position than without them. Additional results included how the design of the virtual room and direction indication method affect on measured localization accuracy.Item Approaches for content-based retrieval of surface defect images(Helsinki University of Technology, 2006-10-20) Pakkanen, Jussi; Department of Computer Science and Engineering; Tietotekniikan osasto; Laboratory of Computer and Information Science; Informaatiotekniikan laboratorioThere are two properties which all industrial manufacturing processes try to optimize: speed and quality. Speed can also be called throughput and tells how much products can be created in a specified time. The higher speeds you have the better. Quality means the perceived goodness of the finished product. Broken or defective products simply don't sell, so they must be eliminated. These are contradicting goals. The larger the manufacturing volumes, the less time there is to inspect a single product, or the more inspectors are required. A good example is paper manufacturing. A single paper machine can produce a sheet of paper several meters wide and several hundred kilometers long in just a few hours. It is impossible to inspect these kinds of volumes by hand. In this thesis the indexing and retrieval of defect images taken by an automated inspection machine is examined. Some of the images taken contain serious defects such as holes, while others are less grave. The goal is to try to develop automated methods to find the serious fault images from large databases using only the information in the images. This means that there are no annotations. This is called content-based image retrieval, or CBIR. This problem is examined in two different ways. First the PicSOM CBIR tool's suitability for this task is evaluated. PicSOM is a platform for content-based image retrieval developed at the Laboratory of Computer and Information Science, Helsinki University of Technology. PicSOM has earlier been succesfully applied to various different CBIR tasks. The other part involves developing new algorithms for efficient indexing of large, high-dimensional databases. The Evolving Tree (ETree), a novel hierarchical, tree-shaped, self-organizing neural network is presented and analyzed. It is noticeably faster than classical methods, while still obtaining good results. The suitability and performance of both CBIR and ETree on this problem is evaluated using several different experiments. The results show that both approaches are applicable for this real world quality inspection problem with good results.Item Aspects of modelling and simulation of genetic algorithms : a formal approach(Helsinki University of Technology, 2002-11-01) Sandqvist, Sam; Department of Computer Science and Engineering; Tietotekniikan osasto; Laboratory for Theoretical Computer ScienceGenetic algorithms (GAs) are widely used in solving search and optimisation problems involving very large search spaces, or very many variables where closed form solutions are impractical due to the very size of the problems. This dissertation combines the two salient features of GAs, namely the temporal aspect of the evolutionary approach to solving problems at the heart of the GA, and the stochastic aspect of evolution arising from its reliance on basically random generation of new individuals with stringent selection in determining survival. The work centres around describing the formal modelling of GAs using a logical approach based on standard first-order logic combined with temporal logic and with probabilistic logic. These logics are combined into a unified logic, temporal-probabilistic logic (TPL) which is formulated in this work. The GA is then described using TPL as the main tool, and the working of the GA is detailed from its components to the actual processes by formulating a model of the GA. Several important parameters are described and analysed, as is the important mechanism of selection. A simple axiomatisation of the GA using TPL is described as well. Also presented are simulation of the workings of the genetic algorithm based on high-level Petri nets and experimentation with a genetic algorithm package providing experimental evidence centring on the various selection mechanisms for some of the theoretical results.Item Aspects on availability : a teleological adventure of information in the lifeworld(Helsinki University of Technology, 2004-12-17) Kuusisto, Rauno; Department of Computer Science and Engineering; Tietotekniikan osasto; Telecommunications Software and Multimedia Laboratory; Tietoliikenneohjelmistojen ja multimedian laboratorioAvailability is a feature of information. Availability is frequently connected to information security. So it is now. But in this thesis, the viewpoint is slightly different form the usually understood context of information security. Here, availability is dealt with as a compulsory feature of the information, which is essential to make as optimal decisions as possible for making the future. Security is understood as a state, which will be reached by using information that is available. The approach to security in this thesis is broadly understood as "making security with available information". The overall frame of this thesis is based on European philosophy represented by Henri Bergson, Martin Heidegger, Karl Popper, Hans-Georg Gadamer and Jurgen Habermas. The philosophical approach is chosen to be able to deal with the rather broad and abstract wholeness of the overall problem. The philosophical approach is something new for the researcher, as well, thus offering new challenges and an opportunity to learn something new and broaden the worldview. The philosophical background is supplemented with theories about time, information and knowledge management and problem solving to reach more practical solutions. Finally, a representative entity of the information availability environment is analysed to validate the philosophical-theoretical construction. The hypothesis of this thesis is formulated around the theories of time, information and problem solving concluded with the description of social systems conducted in the theory of communicative act by Jurgen Habermas. Research questions will deal with the combination of time and information to reach understanding to construct a model, which is used to solve information availability issues in temporally demanding decision-making environment. This thesis is hermeneutical, reaching towards understanding those phenomena that concern information availability. Features of availability that are dealt with are connected to information itself and the purpose of using information, the refining process of information in a purposeful action, and time from few viewpoints. All this is connected in a context of a social system. On the basis of all this, a suitably abstracted model is constructed to understand how information can be used and what kind of aspects availability contains. On the basis of that, a complex entity of information availability in a temporally demanding decision-making environment is analysed. This entity contains the temporal and meaningful divergence of information. This system is solved on the basis of the theoretical frame. A new kind of solution to model the use of available information in a temporally demanding decision-making environment is constructed. Finally a number of new research questions will emerge, which is well suited to the science philosophical frame that is selected to be obeyed in this thesis.Item Attribute trees as adaptive object models in image analysis(Helsinki University of Technology, 2001-02-23) Peura, Markus; Department of Computer Science and Engineering; Tietotekniikan osasto; Laboratory of Computer and Information Science, Neural Networks Research Centre; Informaatiotekniikan laboratorioThis thesis focuses on the analysis of irregular hierarchical visual objects. The main approach involves modelling the objects as unordered attribute trees. A tree presents the overall organization, or topology, of an object, while the vertex attributes describe further visual properties such as intensity, color, or size. Techniques for extracting, matching, comparing, and interpolating attribute trees are presented, principally aiming at systems that can learn to recognize objects. Analysis of weather radar images has been the pilot application for this study, but the main ideas are applicable in structural pattern recognition more generally. The central original contribution of this thesis is the Self-Organizing Map of Attribute Trees (SOM-AT) which demonstrates how the proposed tree processing techniques - tree indexing, matching, distance functions, and mixtures - can be embedded in a learning system; the SOM-AT is a variant of the Self-Organizing Map (SOM), the neural network model invented by Prof. Teuvo Kohonen. The SOM is especially suited to visualizing distributions of objects, classifying objects and monitoring changes in objects. Hence, the SOM-AT can be applied in the respective tasks involving hierarchical objects. More generally, the proposed ideas are applicable in learning systems involving comparisons and interpolations of trees. The suggested heuristic index-based tree matching scheme is another independent contribution. The basic idea of the heuristic is to divide trees to subtrees and match the subtrees recursively by means of topological indices. Given two attribute trees, the larger of which has N vertices, and the maximal child count (out-degree) is D vertices, the complexity of the scheme is only O(N log D) operations while exact matching schemes seem to have at least quadratic complexity: O(N2.5) operations in checking isomorphisms and O(N3) operations in matching attribute trees. The proposed scheme is efficient also in terms of its "hit rate", the number of successfully matched vertices. In matching two random trees of N <= 10 vertices, the number of heuristically matched vertices is on average 99% of the optimum, and the accuracy for trees of N <= 500 vertices is still above 90%. The feasibility of the proposed techniques is demonstrated by database querying, monitoring, and clustering of weather radar images. A related tracking scheme is outlined as well. In addition to weather radar images, a case study is presented on Northern light (Aurora Borealis) images. Due to the generic approach, there are also medical and geographical applications in view.Item Automata and linear temporal logic : translations with transition-based acceptance(Helsinki University of Technology, 2006-10-27) Tauriainen, Heikki; Department of Computer Science and Engineering; Tietotekniikan osasto; Laboratory for Theoretical Computer Science; Tietojenkäsittelytekniikan laboratorioAutomata theory provides powerful tools for designing and implementing decision procedures for temporal logics and their applications to the automatic verification of systems against their logical specifications. Implementing these decision procedures by making use of automata built from the systems and their specifications with translation procedures is challenging in practice due to the tendency of the automata to grow easily unmanageably large as the size of the systems or the logical specifications increases. This thesis develops the theory of translating propositional linear time temporal logic (LTL) into nondeterministic automata via self-loop alternating automata. Unlike nondeterministic automata, self-loop alternating automata are expressively equivalent to LTL and allow a conceptually simple translation of LTL specifications into automata using a set of rules for building automata incrementally from smaller components. The use of generalized transition-based acceptance for automata throughout all constructions gives rise to new optimized translation rules and facilitates designing heuristics for the minimization of automata by making use of language containment tests combined with structural analysis of automata. The generalized definition also supports the translation of self-loop alternating automata into nondeterministic automata by essentially applying the standard subset construction; this construction can be further simplified and optimized when working with automata built from LTL formulas. The translation rules can also be used to identify a syntactic subclass of LTL for which the exponential increase caused by the subset construction in the number of states of the automaton can always be avoided; consequently, the satisfiability problem for this subclass, which is shown to extend related subclasses known from the literature, is NP-complete. Additionally, the emptiness of generalized nondeterministic automata is shown to be testable without giving up generalized transition-based acceptance by using a new variant of the well-known nested depth-first search algorithm with improved worst-case resource requirements.