Anomaly Detection from Patient Visit Data
Perustieteiden korkeakoulu | Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Machine Learning and Data Mining
Master’s Programme in Computer, Communication and Information Sciences
AbstractHospital operation cost rises due to the growing demand for outpatient services by increasing elderly population. To reduce the operation cost and serve the patients better, improvements on the efficiency in healthcare service institutes are required. Among several potential aspects of efficiency improvements, smoother patient visits are highly desired. Thanks to the digital era, patient visits to the hospital can be recorded with all details. The Oulu Hospital in Finland starts to gather patient visits data since 2011, using queue system provided by X-Akseli company. Utilizing these collected data, this thesis aims at designing a practical way of detecting anomalies from patient visits. With the help from this system, the hospital administrative staff could analyze the performance of the queue procedure in the hospital and optimize the procedure. Even better, the system can identify anomalies in real-time so that the patient can get immediate help when it is needed. The thesis explored two categories of methods: clustering methods and generative methods. Four candidate algorithms, K-Means, DBSCAN, Markov Chain, and Hidden Markov Model, are discussed. The discussion suggests that DBSCAN and Hidden Markov Model are more practical. Then we proposed a new data representation and used negative binomial distribution in Hidden Markov Model to model patient states durations. The experiment result was visualized using t- SNE and evaluated by user interpretation. The analyses show that both DBSCAN and Hidden Markov Model can effectively detect anomalies from patient visits data. But in terms of time and space complexity, and real-time detection, Hidden Markov Model is a better choice.
Thesis advisorHollmén, Jaakko
sequence data, clustering, generative Markov models, duration modelling, Poisson distribution, negative binomial distribution