AI-based Tool
To understand the importance of the features learnt while performing transfer learning in case of Deep Convolutional Neural Networks (DCNNs), I selected the problem of classification of Chest X-Rays into COVID-19 positive and negative. I intend to perform bit plane slicing and provide these bit planes separately as input to the DCNNs for transfer learning.
The dataset used for the study can be obtained from Kaggle. The dataset consists of 676 images and 408 images for training and validation respectively belonging to both positive and negative class. The rest of the dataset, that is bit plane sliced images, are obtained manually as explained in the sections later.
Initially, the performance of VGG16, InceptionV3, Inception-ResNet, DenseNet121, MobileNetV2, and ResNet101 is measured on the normal dataset to check which model performs the best. The general flow of pre-processing, training and evaluation is described as follows:
Firstly, we perform image augmentation. DCNNs frequently take a substantial amount of training data to attain high efficiency. Image augmentation is a method to increase the performance of DCNNs when creating a robust image classifier with very little training data. Image augmentation generates training images artificially using various processing methods or a mix of techniques, such as random rotation, shifts, shear, flips, etc. Earlier, the augmented images were saved along side the original dataset and provided to the DCNNs for training, however this caused memory constraints, therefore tensorflow introduced the concept of real-time data augmentation as the model trains, which is implemented in this study. This is done for both the training and the validation datasets.
Batch Size 32
Learning Rate 0.0001
Epochs 50
EarlyStopping Patience 3
ReductionOfLR Patience 2
Accuracy and Validation Accuracy compared for the various models
Loss and Validation Loss compared for the various models
Observations from the graphs:
VGG16: The VGG16 model is unable to improve it’s validation accuracy for three epochs because it potentially fails to learn vitals features that help distinguish the COVID-19 positive and negative. As seen in the graph the training stops after three epochs and within those iterations it is evident that the model overfits on the training data as it is unable to perform adequately on the validation dataset.
InceptionV3, Inception-ResNet, DenseNet121, and ResNet101: The InceptionV3 model has a large variance when validating on unseen data. Other models like Inception-ResNet, DenseNet121, and ResNet101 also have a large difference in the training and validation accuracies and losses.
MobileNetV2: The MobileNetV2 model performs the best among the others. It has the highest training accuracy of about 95% and a validation accuracy of about 91%, while this still indicates overfitting, the extend of overfitting is least in this case. The same holds for losses, the difference in the training and validation loss is the least.
In summary, the MobileNet baseline model exhibited promising performance, with increasing accuracy and decreasing loss as training progressed. The validation accuracy and loss further demonstrated the model’s capability to generalize effectively. These results emphasize the suitability of the MobileNet architecture for the given task and dataset, and indicate the potential for further optimization or fine-tuning to enhance performance therefore now first bit plane slicing is performed and then thereafter train the MobileNetV2 model on the various extracted bit planes.
Pixels are digital numbers that are comprised of bits. Instead of emphasizing the gray-level range, we choose to observe each bit’s contribution. It can be done using bit plane slicing. By isolating particular bits of the pixel values in an image, we can often highlight interesting aspects of that image. Higher-order bits usually contain most of the important visual information. Lower-order bits have subtle details.
Bit Plane Slicing for an 8 bit image
The image below shows the different planes obtained for a CXR, as we can observe the lower bit plane images are not visually informative, whereas the higher ones contain significant information. The lower bit planes have information that is not visually interpretable, but is identified by the DCNN as we will see in the sections ahead.
Different Planes for a CXR
Now that we’ve established the fact MobileNetV2 performs the best, we train the MobileNetV2 model on the bit plane sliced images as shown above.
Bit Plane 0 (Training and Validation Accuracy and Loss)
Bit Plane 1 (Training and Validation Accuracy and Loss)
Bit Plane 2 (Training and Validation Accuracy and Loss)
Bit Plane 3 (Training and Validation Accuracy and Loss)
Bit Plane 4 (Training and Validation Accuracy and Loss)
Bit Plane 5 (Training and Validation Accuracy and Loss)
Bit Plane 6 (Training and Validation Accuracy and Loss)
Bit Plane 7 (Training and Validation Accuracy and Loss)
It would be fair to conclude that individual planes when provided as input to the DCNN overfits on the data. Hence training the entire image without any extraction would be a useful tool, fortunately a lot of work has been many works on the same, listed below in the references section.
The code is written in Python 3, and run on a jupyter notebook, which can be accessed here - CXR COVID.