Device Technologies and Biomedical Robotics
Elia Haghbin (she/her/hers)
Undergraduate Student Researcher
Fairfield University
Bridgeport, Connecticut, United States
Vlad S. Surdu (he/him/his)
Undergraduate Student Researcher
University of Toronto
Toronto, Ontario, Canada
Xiaoli Yang
Professor and Chair of Computer Science
Fairfield University, United States
Mixed reality technology blends virtual elements into the user’s surroundings for interactive visualizations and digital simulations. The HoloLens 2 headset instantiates a mixed-reality environment with its wearable and wireless design featuring holographic lenses, camera arrays, and precise eye-tracking capabilities. Holographic image-guidance models merged with surgical systems have shown implications of tumor localization by increasing incision accuracy and reducing procedural attempts [1]. Remote HoloLens education for clinical procedures has further enhanced student learning with high consistency and accessibility [2]. However, real-time educational procedures face object-detection challenges during integration with the mixed reality environment. YOLOv8 (You Only Look Once) algorithms leverage Ultralytics machine learning models for image recognition and classification. Although preliminary results show high rates of YOLO object detection [3], variations in object occlusion, environment lighting, and background complexity pose challenges to image recognition accuracy. These object detection challenges are further pronounced in hands-on educational activities and laboratory experiments. Anatomical dissections provide immersive student learning experiences for the study of biological organisms, structures, and functions. However, students encounter challenges in organ detection with complex variations in organ structure, size, orientation, and probable concealment by surrounding tissues. This study aims to enhance the student learning experience in fetal pig dissections by leveraging accurately-trained YOLOv8 models for image recognition and interactive guidance with the HoloLens mixed-reality headset. Software development merged Unity3D and Visual Studio with the CVAT artificial intelligence platform, Ultralytics Python algorithms, and Open Neural Network Exchange (ONNX) for training object detection models.
Visual Studio software was integrated with Unity3D through the Universal Windows Platform, C++ v143, and .NET development workloads. The data collection process utilized open-source images to assemble standard format images containing 1200 external and internal fetal pig organs. In the annotation workflow through the CVAT, organ labels were generated with corresponding text files containing the annotated coordinates of bounding boxes. The YAML data-serialization language retrieved the image recognition data through folder paths and object names of the twelve organ classes – 0:umbilical cord, 1:snout, 2:pinna, 3:mouth, 4:head, 5:eyes, 6:stomach, 7:lungs, 8:liver, 9:kidneys, 10:intestines, and 11:heart. In the virtual Python environment of Visual Studio code, the YOLOv8 algorithm leveraged the Ultralytics library to train and validate image recognition models. The machine learning algorithms were exported in the ONNX format with image dimensions maintained at 320px in width and 256px in height. In the model processor of Unity editor, scripts fed the camera’s visual information into the ONNX model to develop bounding boxes labeling potential organs. The HoloLens Time of Flight depth sensor utilizes the center of bounding boxes with the highest probability of containing objects for the development of object labels with 3D coordinates. Interactive buttons were implemented in the mixed reality user interface with information panes displaying the detected organ functions. The trained model's image recognition accuracy was determined through data analysis of confusion matrices, losses, and metrics precisions at 100 epochs.
The dissection tutorial was developed with Mixed-Reality Tool Kit holographic assets with information buttons and instruction pains to guide students through the process. To determine the performance of the deep-learning models for the tutorial’s organ detection, the box losses for the training and validation batches over epochs were obtained from the Python environment. The internal organs’ final box losses were 1.3093 for training (Fig. 1a.) and 1.1767 for validation (Fig. 2a.), while the external organs' final box losses were 1.4971 for training (Fig. 1b.) and 1.3678 (Fig. 2b.) for validation. Since the loss curves plateau with similar final values, the model effectively fits the data with adequately emulated algorithms for detecting anatomical features and minimal accumulated error in performance. The mean Average Precisions (mAP50s) were obtained from the precision-recall curves generated with the Intersection over Union threshold of 0.5. The final mAP50s were 0.87047 for the internal organs (Fig. 3a.) and 0.83138 for the external organs (Fig. 3b.). Since the models maintain high precision even at high recall levels, these values indicate the models’ precise detection of most organ pixels without selecting non-ogan pixels in the image. The true-positive predictions were obtained from the models’ normalized confusion matrix to determine the model’s accuracy in organ differentiation. The average true-positive predictions were 0.84 for internal organs with a maximum of 0.93 for intestine detection (Fig. 4a.) and 0.86 for external organs with a maximum of 0.99 for head detection (Fig. 5a). Given the high true-positive percentages, the models effectively distinguish the different internal and external organs with some challenges in dissociating objects from their backgrounds leading to few false positive labels. The development of a mixed reality system for organ detection in fetal pig dissection has shown promising results with high recognition accuracy. Therefore, the integration of the YOLOv8 algorithm has the potential to enhance the student learning experience with more interactive, informative, and engaging dissection processes. Future work will incorporate usability tests with quantitative questionnaires to analyze student satisfaction, dissection quality, and laboratory completion time.
[1] Torabinia M, Caprio A, Fenster TB, Mosadegh B. “Single Evaluation of Use of a Mixed Reality Headset for Intra-Procedural Image-Guidance during a Mock Laparoscopic Myomectomy on an Ex-Vivo Fibroid Model.” Journal of Applied Sciences. 2022; 12(2):563. doi: 10.3390/app12020563.
[2] Bala L., Kinross J., Martin G., Koizia L.J., Kooner A.S., Shimshon G.J., Hurxkens T.J., Pratt P.J., Sam A.H. “A remote access mixed reality teaching ward round.” The Clinical Teacher. 2021; 18:386–390. doi: 10.1111/tct.13338.
[3] H. Bahri, D. Krčmařík and J. Kočí, “Accurate Object Detection System on HoloLens Using YOLO Algorithm,” International Conference on Control, Artificial Intelligence, Robotics and Optimization , 2019, pp. 219-224, doi: 10.1109/ICCAIRO47923.2019.00042.