Browsing by Author "Sahlsten, Jaakko"
Now showing 1 - 20 of 20
- Results Per Page
- Sort Options
- Active learning and interactive training for retinal image classification
Perustieteiden korkeakoulu | Master's thesis(2018-06-18) Sahlsten, JaakkoThe goal of this study is to investigate application of deep learning and human-computer interaction on diagnosing diabetic retinopathy from colour fundus images. We apply deep learning and study the effects of network pretraining, active learning and a personalised annotation on a private dataset. Diabetic retinopathy is a global issue with increasing number of patients and screening cases each year. In the current trend, an increasing amount of fundus images are scanned which in turn require diagnosis of diabetic retinopathy and other eye diseases, constituting in a major expenditure in an ophthalmologist's time. To aid and speed up the increasing diagnosis and annotation tasks, a machine learning solution is suggested for automatic diagnosis of diabetic retinopathy from colour fundus images. The State-of-the-art deep neural network has been demonstrated to achieve the same performance as opthamologists in the diagnosing referable diabetic retinopathy, while trained on tens of thousands of colour fundus images and associated labels. In this work the State-of-the-art model was deployed using smaller dataset. The model was trained from random initialisation and from pretrained weights from training on ImageNet dataset. Fine-tuning the pretrained network was compared to a network trained from scratch on two test sets. Fine-tuned model had area under receiving operator characteristic (ROCAUC) of 0.965 and 0.921, and model trained from random initialisation had ROCAUC of 0.962 and 0.879. Active learning is a well-studied subfield of machine learning and has been applied successfully. However, there is limited literate on applying it to high-dimensional data with deep neural networks. In this work, recent active learning solutions were applied to diabetic retinopathy classification in order to reduce required size of dataset to achieve required opthamologists performance in classifying referable diabetic retinopathy when applied in screening. The solution achieved the threshold with 8700 images compared to randomly sampled requiring 10500 images. A model was developed attempting to learn the user preferences in annotation with the help of pretrained network. The trained model was compared to a reference model with no human feedback and evaluated on subjective and objective performance. Tool was tested anecdotally which showed that it was able to subjective gradability to some extent. However, the tool did not provide additional benefits in subjective classification of retinopathy. - Applicability and Robustness of Deep Learning in Healthcare
School of Science | Doctoral dissertation (article-based)(2024) Sahlsten, JaakkoThe worldwide population is aging causing an increased demand for healthcare, motivating a goal to reduce the burden of health professionals to maintain the expected level of care. Deep learning (DL)-based methods, especially convolutional neural networks (CNNs), have achieved state-of-the-art performance in various classification and segmentation tasks on imaging data. Thus, there is an interest in applying these methods to automate routine, laborious or time-consuming clinical tasks based on medical imaging. However, the conventional DL approaches may not be trustworthy to be used in healthcare due to limited explainability combined with overconfidence and their sensitivity to distribution shifts. In this thesis, the applicability of DL approaches in healthcare are investigated in the clinical tasks of screening and medical image segmentation. The approaches are evaluated for robustness to distribution shifts with in-distribution and out-of-distribution datasets including other imaging centers, devices, and under defacing techniques. In order to improve the lack of explainability and overconfidence, approximate Bayesian neural networks with novel uncertainty measures are applied to the tasks and systematically evaluated in terms of performance and uncertainty quantification. The deep learning paradigm and its practical usage in the investigated medical imaging tasks is first introduced. The following part describes uncertainty quantification in deep learning, its downstream utilization in clinical workflow, and the current approaches of approximate Bayesian deep learning. The next part includes the summary for the included publications and related works. The last part includes the conclusion and discussion about the analysis, its limitations, and proposed future research to improve the trustworthiness and applicability of deep learning techniques for imaging in healthcare. The publications demonstrated that CNN-based DL methods have clinically acceptable performance in the evaluated tasks using in-distribution data. However, the robustness to distribution shift varied depending on the task such as robustness to other imaging devices but sensitivity to defacing in segmentation. In terms of explainability and overconfidence, the approximate Bayesian deep learning and the novel uncertainty measures demonstrated improved utility of uncertainty in comparison to conventional approaches in both tasks. - Application of simultaneous uncertainty quantification and segmentation for oropharyngeal cancer use-case with Bayesian deep learning
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2024-12) Sahlsten, Jaakko; Jaskari, Joel; Wahid, Kareem A.; Ahmed, Sara; Glerean, Enrico; He, Renjie; Kann, Benjamin H.; Mäkitie, Antti; Fuller, Clifton D.; Naser, Mohamed A.; Kaski, KimmoBackground Radiotherapy is a core treatment modality for oropharyngeal cancer (OPC), where the primary gross tumor volume (GTVp) is manually segmented with high interobserver variability. This calls for reliable and trustworthy automated tools in clinician workflow. Therefore, accurate uncertainty quantification and its downstream utilization is critical. Methods Here we propose uncertainty-aware deep learning for OPC GTVp segmentation, and illustrate the utility of uncertainty in multiple applications. We examine two Bayesian deep learning (BDL) models and eight uncertainty measures, and utilize a large multi-institute dataset of 292 PET/CT scans to systematically analyze our approach. Results We show that our uncertainty-based approach accurately predicts the quality of the deep learning segmentation in 86.6% of cases, identifies low performance cases for semi-automated correction, and visualizes regions of the scans where the segmentations likely fail. Conclusions Our BDL-based analysis provides a first-step towards more widespread implementation of uncertainty quantification in OPC GTVp segmentation. - Artificial Intelligence for Radiation Oncology Applications Using Public Datasets
A2 Katsausartikkeli tieteellisessä aikakauslehdessä(2022-10) Wahid, Kareem A.; Glerean, Enrico; Sahlsten, Jaakko; Jaskari, Joel; Kaski, Kimmo; Naser, Mohamed A.; He, Renjie; Mohamed, Abdallah S.R.; Fuller, Clifton D.Artificial intelligence (AI) has exceptional potential to positively impact the field of radiation oncology. However, large curated datasets - often involving imaging data and corresponding annotations - are required to develop radiation oncology AI models. Importantly, the recent establishment of Findable, Accessible, Interoperable, Reusable (FAIR) principles for scientific data management have enabled an increasing number of radiation oncology related datasets to be disseminated through data repositories, thereby acting as a rich source of data for AI model building. This manuscript reviews the current and future state of radiation oncology data dissemination, with a particular emphasis on published imaging datasets, AI data challenges, and associated infrastructure. Moreover, we provide historical context of FAIR data dissemination protocols, difficulties in the current distribution of radiation oncology data, and recommendations regarding data dissemination for eventual utilization in AI models. Through FAIR principles and standardized approaches to data dissemination, radiation oncology AI research has nothing to lose and everything to gain. - Auto-detection and segmentation of involved lymph nodes in HPV-associated oropharyngeal cancer using a convolutional deep learning neural network
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2022-09) Taku, Nicolette; Wahid, Kareem A.; van Dijk, Lisanne V.; Sahlsten, Jaakko; Jaskari, Joel; Kaski, Kimmo; Fuller, Clifton D.; Naser, Mohamed A.Purpose: Segmentation of involved lymph nodes on head and neck computed tomography (HN-CT) scans is necessary for the radiotherapy planning of early-stage human papilloma virus (HPV) associated oropharynx cancers (OPC). We aimed to train a deep learning convolutional neural network (DL-CNN) to segment involved lymph nodes on HN-CT scans. Methods: Ground-truth segmentation of involved nodes was performed on pre-surgical HN-CT scans for 90 patients who underwent levels II-IV neck dissection for node-positive HPV-OPC (training/validation [n = 70] and testing [n = 20]). A 5-fold cross validation approach was used to train 5 DL-CNN sub-models based on a residual U-net architecture. Validation and testing segmentation masks were compared to ground-truth masks using predetermined metrics. A lymph auto-detection model to discriminate between “node-positive” and “node-negative” HN-CT scans was developed by thresholding segmentation model outputs and evaluated using the area under the receiver operating characteristic curve (AUC). Results: In the DL-CNN validation phase, all sub-models yielded segmentation masks with median Dice ≥ 0.90 and median volume similarity score of ≥ 0.95. In the testing phase, the DL-CNN produced consensus segmentation masks with median Dice of 0.92 (IQR, 0.89–0.95), median volume similarity of 0.97 (IQR, 0.94–0.99), and median Hausdorff distance of 4.52 mm (IQR, 1.22–8.38). The detection model achieved an AUC of 0.98. Conclusion: The results from this single-institution study demonstrate the successful automation of lymph node segmentation for patients with HPV-OPC using a DL-CNN. Future studies, including validation with an external dataset, are necessary to clarify its role in the larger radiation oncology treatment planning workflow. - Comparison of deep learning segmentation and multigrader-annotated mandibular canals of multicenter CBCT scans
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2022-12) Järnstedt, Jorma; Sahlsten, Jaakko; Jaskari, Joel; Kaski, Kimmo; Mehtonen, Helena; Lin, Ziyuan; Hietanen, Ari; Sundqvist, Osku; Varjonen, Vesa; Mattila, Vesa; Prapayasotok, Sangsom; Nalampang, SakaratDeep learning approach has been demonstrated to automatically segment the bilateral mandibular canals from CBCT scans, yet systematic studies of its clinical and technical validation are scarce. To validate the mandibular canal localization accuracy of a deep learning system (DLS) we trained it with 982 CBCT scans and evaluated using 150 scans of five scanners from clinical workflow patients of European and Southeast Asian Institutes, annotated by four radiologists. The interobserver variability was compared to the variability between the DLS and the radiologists. In addition, the generalisation of DLS to CBCT scans from scanners not used in the training data was examined to evaluate its out-of-distribution performance. The DLS had a statistically significant difference (p < 0.001) with lower variability to the radiologists with 0.74 mm than the interobserver variability of 0.77 mm and generalised to new devices with 0.63 mm, 0.67 mm and 0.87 mm (p < 0.001). For the radiologists’ consensus segmentation, used as a gold standard, the DLS showed a symmetric mean curve distance of 0.39 mm, which was statistically significantly different (p < 0.001) compared to those of the individual radiologists with values of 0.62 mm, 0.55 mm, 0.47 mm, and 0.42 mm. These results show promise towards integration of DLS into clinical workflow to reduce time-consuming and labour-intensive manual tasks in implantology. - Deep learning auto-segmentation of cervical skeletal muscle for sarcopenia analysis in patients with head and neck cancer
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2022-07-28) Naser, Mohamed A.; Wahid, Kareem A.; Grossberg, Aaron J.; Olson, Brennan; Jain, Rishab; El-Habashy, Dina; Dede, Cem; Salama, Vivian; Abobakr, Moamen; Mohamed, Abdallah S.R.; He, Renjie; Jaskari, Joel; Sahlsten, Jaakko; Kaski, Kimmo; Fuller, Clifton D.Background/Purpose: Sarcopenia is a prognostic factor in patients with head and neck cancer (HNC). Sarcopenia can be determined using the skeletal muscle index (SMI) calculated from cervical neck skeletal muscle (SM) segmentations. However, SM segmentation requires manual input, which is time-consuming and variable. Therefore, we developed a fully-automated approach to segment cervical vertebra SM. Materials/Methods: 390 HNC patients with contrast-enhanced CT scans were utilized (300-training, 90-testing). Ground-truth single-slice SM segmentations at the C3 vertebra were manually generated. A multi-stage deep learning pipeline was developed, where a 3D ResUNet auto-segmented the C3 section (33 mm window), the middle slice of the section was auto-selected, and a 2D ResUNet auto-segmented the auto-selected slice. Both the 3D and 2D approaches trained five sub-models (5-fold cross-validation) and combined sub-model predictions on the test set using majority vote ensembling. Model performance was primarily determined using the Dice similarity coefficient (DSC). Predicted SMI was calculated using the auto-segmented SM cross-sectional area. Finally, using established SMI cutoffs, we performed a Kaplan-Meier analysis to determine associations with overall survival. Results: Mean test set DSC of the 3D and 2D models were 0.96 and 0.95, respectively. Predicted SMI had high correlation to the ground-truth SMI in males and females (r>0.96). Predicted SMI stratified patients for overall survival in males (log-rank p = 0.01) but not females (log-rank p = 0.07), consistent with ground-truth SMI. Conclusion: We developed a high-performance, multi-stage, fully-automated approach to segment cervical vertebra SM. Our study is an essential step towards fully-automated sarcopenia-related decision-making in patients with HNC. - Deep learning for 3D cephalometric landmarking with heterogeneous multi-center CBCT dataset
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2024-06) Sahlsten, Jaakko; Järnstedt, Jorma; Jaskari, Joel; Naukkarinen, Hanna; Mahasantipiya, Phattaranant; Charuakkra, Arnon; Vasankari, Krista; Hietanen, Ari; Sundqvist, Osku; Lehtinen, Antti; Kaski, KimmoCephalometric analysis is critically important and common procedure prior to orthodontic treatment and orthognathic surgery. Recently, deep learning approaches have been proposed for automatic 3D cephalometric analysis based on landmarking from CBCT scans. However, these approaches have relied on uniform datasets from a single center or imaging device but without considering patient ethnicity. In addition, previous works have considered a limited number of clinically relevant cephalometric landmarks and the approaches were computationally infeasible, both impairing integration into clinical workflow. Here our aim is to analyze the clinical applicability of a light-weight deep learning neural network for fast localization of 46 clinically significant cephalometric landmarks with multi-center, multi-ethnic, and multi-device data consisting of 309 CBCT scans from Finnish and Thai patients. The localization performance of our approach resulted in the mean distance of 1.99 ± 1.55 mm for the Finnish cohort and 1.96 ± 1.25 mm for the Thai cohort. This performance turned out to be clinically significant i.e., ≤ 2 mm with 61.7% and 64.3% of the landmarks with Finnish and Thai cohorts, respectively. Furthermore, the estimated landmarks were used to measure cephalometric characteristics successfully i.e., with ≤ 2 mm or ≤ 2̊ error, on 85.9% of the Finnish and 74.4% of the Thai cases. Between the two patient cohorts, 33 of the landmarks and all cephalometric characteristics had no statistically significant difference (p < 0.05) measured by the Mann-Whitney U test with Benjamini–Hochberg correction. Moreover, our method is found to be computationally light, i.e., providing the predictions with the mean duration of 0.77 s and 2.27 s with single machine GPU and CPU computing, respectively. Our findings advocate for the inclusion of this method into clinical settings based on its technical feasibility and robustness across varied clinical datasets. - Deep Learning Fundus Image Analysis for Diabetic Retinopathy and Macular Edema Grading
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2019-12-01) Sahlsten, Jaakko; Jaskari, Joel; Kivinen, Jyri; Turunen, Lauri; Jaanio, Esa; Hietala, Kustaa; Kaski, KimmoDiabetes is a globally prevalent disease that can cause visible microvascular complications such as diabetic retinopathy and macular edema in the human eye retina, the images of which are today used for manual disease screening and diagnosis. This labor-intensive task could greatly benefit from automatic detection using deep learning technique. Here we present a deep learning system that identifies referable diabetic retinopathy comparably or better than presented in the previous studies, although we use only a small fraction of images (<1/4) in training but are aided with higher image resolutions. We also provide novel results for five different screening and clinical grading systems for diabetic retinopathy and macular edema classification, including state-of-the-art results for accurately classifying images according to clinical five-grade diabetic retinopathy and for the first time for the four-grade diabetic macular edema scales. These results suggest, that a deep learning system could increase the cost-effectiveness of screening and diagnosis, while attaining higher than recommended performance, and that the system could be applied in clinical examinations requiring finer grading. - Deep Learning Method for Mandibular Canal Segmentation in Dental Cone Beam Computed Tomography Volumes
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2020-12-01) Jaskari, Joel; Sahlsten, Jaakko; Järnstedt, Jorma; Mehtonen, Helena; Karhu, Kalle; Sundqvist, Osku; Hietanen, Ari; Varjonen, Vesa; Mattila, Vesa; Kaski, KimmoAccurate localisation of mandibular canals in lower jaws is important in dental implantology, in which the implant position and dimensions are currently determined manually from 3D CT images by medical experts to avoid damaging the mandibular nerve inside the canal. Here we present a deep learning system for automatic localisation of the mandibular canals by applying a fully convolutional neural network segmentation on clinically diverse dataset of 637 cone beam CT volumes, with mandibular canals being coarsely annotated by radiologists, and using a dataset of 15 volumes with accurate voxel-level mandibular canal annotations for model evaluation. We show that our deep learning model, trained on the coarsely annotated volumes, localises mandibular canals of the voxel-level annotated set, highly accurately with the mean curve distance and average symmetric surface distance being 0.56 mm and 0.45 mm, respectively. These unparalleled accurate results highlight that deep learning integrated into dental implantology workflow could significantly reduce manual labour in mandibular canal annotations. - Development and Validation of an Automated Image-Based Deep Learning Platform for Sarcopenia Assessment in Head and Neck Cancer
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2023-08-01) Ye, Zezhong; Saraf, Anurag; Ravipati, Yashwanth; Hoebers, Frank; Catalano, Paul J.; Zha, Yining; Zapaishchykova, Anna; Likitlersuang, Jirapat; Guthier, Christian; Tishler, Roy B.; Schoenfeld, Jonathan D.; Margalit, Danielle N.; Haddad, Robert I.; Mak, Raymond H.; Naser, Mohamed; Wahid, Kareem A.; Sahlsten, Jaakko; Jaskari, Joel; Kaski, Kimmo; Mäkitie, Antti A.; Fuller, Clifton D.; Aerts, Hugo J.W.L.; Kann, Benjamin H.Importance: Sarcopenia is an established prognostic factor in patients with head and neck squamous cell carcinoma (HNSCC); the quantification of sarcopenia assessed by imaging is typically achieved through the skeletal muscle index (SMI), which can be derived from cervical skeletal muscle segmentation and cross-sectional area. However, manual muscle segmentation is labor intensive, prone to interobserver variability, and impractical for large-scale clinical use. Objective: To develop and externally validate a fully automated image-based deep learning platform for cervical vertebral muscle segmentation and SMI calculation and evaluate associations with survival and treatment toxicity outcomes. Design, Setting, and Participants: For this prognostic study, a model development data set was curated from publicly available and deidentified data from patients with HNSCC treated at MD Anderson Cancer Center between January 1, 2003, and December 31, 2013. A total of 899 patients undergoing primary radiation for HNSCC with abdominal computed tomography scans and complete clinical information were selected. An external validation data set was retrospectively collected from patients undergoing primary radiation therapy between January 1, 1996, and December 31, 2013, at Brigham and Women's Hospital. The data analysis was performed between May 1, 2022, and March 31, 2023. Exposure: C3 vertebral skeletal muscle segmentation during radiation therapy for HNSCC. Main Outcomes and Measures: Overall survival and treatment toxicity outcomes of HNSCC. Results: The total patient cohort comprised 899 patients with HNSCC (median [range] age, 58 [24-90] years; 140 female [15.6%] and 755 male [84.0%]). Dice similarity coefficients for the validation set (n = 96) and internal test set (n = 48) were 0.90 (95% CI, 0.90-0.91) and 0.90 (95% CI, 0.89-0.91), respectively, with a mean 96.2% acceptable rate between 2 reviewers on external clinical testing (n = 377). Estimated cross-sectional area and SMI values were associated with manually annotated values (Pearson r = 0.99; P < .001) across data sets. On multivariable Cox proportional hazards regression, SMI-derived sarcopenia was associated with worse overall survival (hazard ratio, 2.05; 95% CI, 1.04-4.04; P = .04) and longer feeding tube duration (median [range], 162 [6-1477] vs 134 [15-1255] days; hazard ratio, 0.66; 95% CI, 0.48-0.89; P = .006) than no sarcopenia. Conclusions and Relevance: This prognostic study's findings show external validation of a fully automated deep learning pipeline to accurately measure sarcopenia in HNSCC and an association with important disease outcomes. The pipeline could enable the integration of sarcopenia assessment into clinical decision making for individuals with HNSCC. - DR-GPT : A large language model for medical report analysis of diabetic retinopathy patients
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2024-10) Jaskari, Joel; Sahlsten, Jaakko; Summanen, Paula; Moilanen, Jukka; Lehtola, Erika; Aho, Marjo; Säpyskä, Elina; Hietala, Kustaa; Kaski, KimmoDiabetic retinopathy (DR) is a sight-threatening condition caused by diabetes. Screening programmes for DR include eye examinations, where the patient’s fundi are photographed, and the findings, including DR severity, are recorded in the medical report. However, statistical analyses based on DR severity require structured labels that calls for laborious manual annotation process if the report format is unstructured. In this work, we propose a large language model DR-GPT for classification of the DR severity from unstructured medical reports. On a clinical set of medical reports, DR-GPT reaches 0.975 quadratic weighted Cohen’s kappa using truncated Early Treatment Diabetic Retinopathy Study scale. When DR-GPT annotations for unlabeled data are paired with corresponding fundus images, the additional data improves image classifier performance with statistical significance. Our analysis shows that large language models can be applied for unstructured medical report databases to classify diabetic retinopathy with a variety of applications. - Harnessing uncertainty in radiotherapy auto-segmentation quality assurance
Letter(2024-01) Wahid, Kareem A.; Sahlsten, Jaakko; Jaskari, Joel; Dohopolski, Michael J.; Kaski, Kimmo; He, Renjie; Glerean, Enrico; Kann, Benjamin H.; Mäkitie, Antti; Fuller, Clifton D.; Naser, Mohamed A.; Fuentes, David - Multi-view Deep Learning for Diabetic Retinopathy Detection
Perustieteiden korkeakoulu | Master's thesis(2022-10-17) Jiang, MingDiabetic retinopathy is one of the complications of diabetes, which is the main cause of vision loss. Recent development in deep convolutional neural networks (DCNN) has achieved high performance in diabetic retinopathy detection, which could serve to assist professionals to make early diagnoses and treatment for the patients based on their fundus images. However, the existing works have mostly focused on training the model with only single-view fundus images, ignoring the interaction as well as possible correlation across fundus images taken from different views of the retina of a patient, and thus might lead to a sub-optimal outcome. In this thesis, we propose a DCNN-based architecture to predict the degree of diabetic retinopathy from multi-view fundus images. Specifically, we utilize EfficientNet-B0 as the basic image feature extractor, and then fuse the features per patient's eye with three different fusion methods to get the summarized representation. We preprocess three benchmark datasets by dividing each fundus image into multiple views, and then randomly drop some views to simulate missing retinal images. Due to the ordinal labels, the quadradic weighted kappa score (kq) is used to measure the performance. Motivated by the clinical situation, the baseline model is defined with the maximum of predictions of single-view images per patient's eye. We make experiments on both simulated multi-view benchmark datasets and a clinical dataset with baseline model and our three fusion methods. Specifically, with an average of 30% images dropped inside fundus images set per patient, on the largest benchmark we used the mean fusion, max fusion, attention fusion methods achieved kq values of 81.92, 81.31, 81.10, respectively, while the baseline model had the kq value of 80.22. On the clinical dataset, the mean and max fusion had the performance values of 74.42 and 73.47 respectively compared to the baseline model's kq value of 71.66. However, the attention fusion method failed to converge on the clinical dataset. - Muscle and adipose tissue segmentations at the third cervical vertebral level in patients with head and neck cancer
Data Article(2022-08-02) Wahid, Kareem A.; Olson, Brennan; Jain, Rishab; Grossberg, Aaron J.; El-Habashy, Dina; Dede, Cem; Salama, Vivian; Abobakr, Moamen; Mohamed, Abdallah S.R.; He, Renjie; Jaskari, Joel; Sahlsten, Jaakko; Kaski, Kimmo; Fuller, Clifton D.; Naser, Mohamed A.The accurate determination of sarcopenia is critical for disease management in patients with head and neck cancer (HNC). Quantitative determination of sarcopenia is currently dependent on manually-generated segmentations of skeletal muscle derived from computed tomography (CT) cross-sectional imaging. This has prompted the increasing utilization of machine learning models for automated sarcopenia determination. However, extant datasets currently do not provide the necessary manually-generated skeletal muscle segmentations at the C3 vertebral level needed for building these models. In this data descriptor, a set of 394 HNC patients were selected from The Cancer Imaging Archive, and their skeletal muscle and adipose tissue was manually segmented at the C3 vertebral level using sliceOmatic. Subsequently, using publicly disseminated Python scripts, we generated corresponding segmentations files in Neuroimaging Informatics Technology Initiative format. In addition to segmentation data, additional clinical demographic data germane to body composition analysis have been retrospectively collected for these patients. These data are a valuable resource for studying sarcopenia and body composition analysis in patients with HNC. - Puettavat mittausjärjestelmät terveyden seurannassa
Sähkötekniikan korkeakoulu | Bachelor's thesis(2014-05-11) Sahlsten, Jaakko - Reproducibility analysis of automated deep learning based localisation of mandibular canals on a temporal CBCT dataset
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2023-12) Järnstedt, Jorma; Sahlsten, Jaakko; Jaskari, Joel; Kaski, Kimmo; Mehtonen, Helena; Hietanen, Ari; Sundqvist, Osku; Varjonen, Vesa; Mattila, Vesa; Prapayasatok, Sangsom; Nalampang, SakaratPreoperative radiological identification of mandibular canals is essential for maxillofacial surgery. This study demonstrates the reproducibility of a deep learning system (DLS) by evaluating its localisation performance on 165 heterogeneous cone beam computed tomography (CBCT) scans from 72 patients in comparison to an experienced radiologist’s annotations. We evaluated the performance of the DLS using the symmetric mean curve distance (SMCD), the average symmetric surface distance (ASSD), and the Dice similarity coefficient (DSC). The reproducibility of the SMCD was assessed using the within-subject coefficient of repeatability (RC). Three other experts rated the diagnostic validity twice using a 0–4 Likert scale. The reproducibility of the Likert scoring was assessed using the repeatability measure (RM). The RC of SMCD was 0.969 mm, the median (interquartile range) SMCD and ASSD were 0.643 (0.186) mm and 0.351 (0.135) mm, respectively, and the mean (standard deviation) DSC was 0.548 (0.138). The DLS performance was most affected by postoperative changes. The RM of the Likert scoring was 0.923 for the radiologist and 0.877 for the DLS. The mean (standard deviation) Likert score was 3.94 (0.27) for the radiologist and 3.84 (0.65) for the DLS. The DLS demonstrated proficient qualitative and quantitative reproducibility, temporal generalisability, and clinical validity. - Segmentation stability of human head and neck cancer medical images for radiotherapy applications under de-identification conditions: Benchmarking data sharing and artificial intelligence use-cases
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2023) Sahlsten, Jaakko; Wahid, Kareem A.; Glerean, Enrico; Jaskari, Joel; Naser, Mohamed A.; He, Renjie; Kann, Benjamin H.; Mäkitie, Antti; Fuller, Clifton D.; Kaski, KimmoBackground: Demand for head and neck cancer (HNC) radiotherapy data in algorithmic development has prompted increased image dataset sharing. Medical images must comply with data protection requirements so that re-use is enabled without disclosing patient identifiers. Defacing, i.e., the removal of facial features from images, is often considered a reasonable compromise between data protection and re-usability for neuroimaging data. While defacing tools have been developed by the neuroimaging community, their acceptability for radiotherapy applications have not been explored. Therefore, this study systematically investigated the impact of available defacing algorithms on HNC organs at risk (OARs). Methods: A publicly available dataset of magnetic resonance imaging scans for 55 HNC patients with eight segmented OARs (bilateral submandibular glands, parotid glands, level II neck lymph nodes, level III neck lymph nodes) was utilized. Eight publicly available defacing algorithms were investigated: afni_refacer, DeepDefacer, defacer, fsl_deface, mask_face, mri_deface, pydeface, and quickshear. Using a subset of scans where defacing succeeded (N=29), a 5-fold cross-validation 3D U-net based OAR auto-segmentation model was utilized to perform two main experiments: 1.) comparing original and defaced data for training when evaluated on original data; 2.) using original data for training and comparing the model evaluation on original and defaced data. Models were primarily assessed using the Dice similarity coefficient (DSC). Results: Most defacing methods were unable to produce any usable images for evaluation, while mask_face, fsl_deface, and pydeface were unable to remove the face for 29%, 18%, and 24% of subjects, respectively. When using the original data for evaluation, the composite OAR DSC was statistically higher (p ≤ 0.05) for the model trained with the original data with a DSC of 0.760 compared to the mask_face, fsl_deface, and pydeface models with DSCs of 0.742, 0.736, and 0.449, respectively. Moreover, the model trained with original data had decreased performance (p ≤ 0.05) when evaluated on the defaced data with DSCs of 0.673, 0.693, and 0.406 for mask_face, fsl_deface, and pydeface, respectively. Conclusion: Defacing algorithms may have a significant impact on HNC OAR auto-segmentation model training and testing. This work highlights the need for further development of HNC-specific image anonymization methods. - Telemedicine in management of Diabetes and Diabetic Retinopathy
Sähkötekniikan korkeakoulu | Bachelor's thesis(2024-04-26) Nayar, AaryanDiabetes mellitus is a chronic condition that arises due to increased blood glucose levels. Diabetic retinopathy is a complication of diabetes that occurs when high blood glucose damages the blood vessels in the eye, resulting in vision loss. There are 537 million patients with diabetes and 103 million are diagnosed with diabetic retinopathy. This population are more vulnerable to contracting infections and therefore the delivery of effective healthcare is required. A key point to note is that diabetic retinopathy can be prevented if it is detected early. Moreover, the screening of diabetes and diabetic retinopathy is vital to ensure that patients receive optimal treatment. Telemedicine is one promising solution to this problem as it enables the provision of healthcare services to patients who face challenges accessing it. The findings indicate that telemedicine in the management of diabetes provides better access to healthcare in remote areas and reduces the wait time for specialist consultations. It assists in the grading of diabetic retinopathy and reduces the burden on ophthalmologists. Telemedicine improves the quality of life by reducing the costs incurred with management of diabetes. Furthermore, it can be utilised in educating patients about diabetes and its associated complications. Despite these benefits, problems such as lowquality image produced hinders the cost effectiveness and potential inaccuracies affect the assessment of the condition. Telemedicine also poses a challenge to the relationship between the patient and physician due to reduced face-to-face interactions. Overcoming these barriers is important for its global implementation. Telemedicine has the potential to replace traditional management methods. The development of an affordable and reliable handheld camera could potentially remove the need for in-person visits. Furthermore, integrating telemedicine in primary healthcare can help manage the growing demand for diabetes care due to the growing population. The development in technology and further research is required for a more widespread implementation of telemedicine in the management of diabetes and screening of diabetic retinopathy. - Uncertainty-Aware Deep Learning Methods for Robust Diabetic Retinopathy Classification
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2022) Jaskari, Joel; Sahlsten, Jaakko; Damoulas, Theodoros; Knoblauch, Jeremias; Sarkka, Simo; Karkkainen, Leo; Hietala, Kustaa; Kaski, Kimmo K.Automatic classification of diabetic retinopathy from retinal images has been increasingly studied using deep neural networks with impressive results. However, there is clinical need for estimating uncertainty in the classifications, a shortcoming of modern neural networks. Recently, approximate Bayesian neural networks (BNNs) have been proposed for this task, but previous studies have only considered the binary referable/non-referable diabetic retinopathy classification applied to benchmark datasets. We present novel results for 9 BNNs by systematically investigating a clinical dataset and 5-class classification scheme, together with benchmark datasets and binary classification scheme. Moreover, we derive a connection between entropy-based uncertainty measure and classifier risk, from which we develop a novel uncertainty measure. We observe that the previously proposed entropy-based uncertainty measure improves performance on the clinical dataset for the binary classification scheme, but not to such an extent as on the benchmark datasets. It improves performance in the clinical 5-class classification scheme for the benchmark datasets, but not for the clinical dataset. Our novel uncertainty measure generalizes to the clinical dataset and to one benchmark dataset. Our findings suggest that BNNs can be utilized for uncertainty estimation in classifying diabetic retinopathy on clinical data, though proper uncertainty measures are needed to optimize the desired performance measure. In addition, methods developed for benchmark datasets might not generalize to clinical datasets.