Multiple Anatomical Structure Recognition in Fetal Ultrasound Images

Feb, 2019·
Jake Bewick
Jake Bewick
· 2 min read
Abstract
Ultrasound imaging is heavily dependent on operator skill - sonographers are trained to recognise key anatomical features when navigating around the body. This is not too dissimilar to how convolutional neural networks (CNNs) can be trained for image classification. Development of a robust and accurate CNN may assist in identification of difficult bodily anatomy. In this report we train and optimise a CNN in multiclass and one vs rest configurations to classify fetal ultrasound images with 87.81±1.96% accuracy.
Type
This machine learning study develops convolutional neural networks (CNNs) to classify fetal ultrasound images into anatomical regions, achieving a high testing accuracy of 87.81% through data augmentation and hyperparameter optimization.

In this report I develop and evaluate custom machine learning convolutional neural networks (CNNs) for classifying anatomical structures in fetal ultrasound (US) images. The study aims to assist sonographers by automating the recognition of key anatomical regions: head, heart, abdomen, and others.

Fetal heart ultrasound image
Figure: Ultrasound imaging of a fetal heart. Sonographers may struggle to clearly identify key anotomical features, leading to clinical error.

A CNN was trained using a labeled dataset of 266 subjects. To improve classification performance and address data imbalance, I used elastic deformation for data augmentation. Two models were tested: a multiclass CNN using softmax activation, and a one-vs-rest CNN using sigmoid functions.

Hyperparameter optimization through grid search identified the optimal CNN architecture, achieving a testing accuracy of 87.81±1.96%, outperforming a simpler base model. Data augmentation and dropout regularization were shown to significantly improve model performance.

Architecture of my optimised convolutional neural network
Figure: Architecture of my optimised convolutional neural network.

While the multiclass model performed slightly better overall, the one-vs-rest model offered comparable F1 scores but was more computationally demanding. Visualization techniques such as confusion matrices and t-SNE plots were used to understand misclassifications, notably between the heart and abdomen classes.