Chest X-Ray Classification

AI-based Tool

Introduction

To understand the importance of the features learnt while performing transfer learning in case of Deep Convolutional Neural Networks (DCNNs), I selected the problem of classification of Chest X-Rays into COVID-19 positive and negative. I intend to perform bit plane slicing and provide these bit planes separately as input to the DCNNs for transfer learning.

Dataset

The dataset used for the study can be obtained from Kaggle. The dataset consists of 676 images and 408 images for training and validation respectively belonging to both positive and negative class. The rest of the dataset, that is bit plane sliced images, are obtained manually as explained in the sections later.

Preliminary Work

Initially, the performance of VGG16, InceptionV3, Inception-ResNet, DenseNet121, MobileNetV2, and ResNet101 is measured on the normal dataset to check which model performs the best. The general flow of pre-processing, training and evaluation is described as follows:

Firstly, we perform image augmentation. DCNNs frequently take a substantial amount of training data to attain high efficiency. Image augmentation is a method to increase the performance of DCNNs when creating a robust image classifier with very little training data. Image augmentation generates training images artificially using various processing methods or a mix of techniques, such as random rotation, shifts, shear, flips, etc. Earlier, the augmented images were saved along side the original dataset and provided to the DCNNs for training, however this caused memory constraints, therefore tensorflow introduced the concept of real-time data augmentation as the model trains, which is implemented in this study. This is done for both the training and the validation datasets.
Once pre-processing is complete, we begin to define the hyperparameters for training the model. The batch size refers to number of sample data points given to the model before it’s weights are updated. The batch size is defined as 32 because a higher batch size leads to degradation in the quality of the model as observed by its ability to generalize, whereas a lower batch size would increase the training time. Conceptually, we can think of the learning rate of our model as the step size. A learning rate of 0.0001 as it is the standard learning rate adapted by the community for transfer learning problems. Finally, we define the epochs as 50, which refers to number of times the model is trained on the entire dataset.
```
Batch Size      32
Learning Rate   0.0001
Epochs          50
```
Lastly, the training of the model begins. Certain criterion are set for training, which include - early stopping, a condition where if a specified metric fails to improve for the duration of a certain number of epochs (defined as patience), the training stops. Reduction of learning rate - a condition where the learning rate is reduced to ensure better training when for a specified number of epochs a certain metric has stopped improving. The metric observed is validation accuracy to avoid the problem of overfitting. Along with this the training and validation accuracy and loss are saved to a csv file for future visualization.
```
EarlyStopping Patience      3
ReductionOfLR Patience      2
```

Accuracy and Validation Accuracy compared for the various models

Loss and Validation Loss compared for the various models

Observations from the graphs:

VGG16: The VGG16 model is unable to improve it’s validation accuracy for three epochs because it potentially fails to learn vitals features that help distinguish the COVID-19 positive and negative. As seen in the graph the training stops after three epochs and within those iterations it is evident that the model overfits on the training data as it is unable to perform adequately on the validation dataset.
InceptionV3, Inception-ResNet, DenseNet121, and ResNet101: The InceptionV3 model has a large variance when validating on unseen data. Other models like Inception-ResNet, DenseNet121, and ResNet101 also have a large difference in the training and validation accuracies and losses.
MobileNetV2: The MobileNetV2 model performs the best among the others. It has the highest training accuracy of about 95% and a validation accuracy of about 91%, while this still indicates overfitting, the extend of overfitting is least in this case. The same holds for losses, the difference in the training and validation loss is the least.

In summary, the MobileNet baseline model exhibited promising performance, with increasing accuracy and decreasing loss as training progressed. The validation accuracy and loss further demonstrated the model’s capability to generalize effectively. These results emphasize the suitability of the MobileNet architecture for the given task and dataset, and indicate the potential for further optimization or fine-tuning to enhance performance therefore now first bit plane slicing is performed and then thereafter train the MobileNetV2 model on the various extracted bit planes.

Bit Plane Slicing

Pixels are digital numbers that are comprised of bits. Instead of emphasizing the gray-level range, we choose to observe each bit’s contribution. It can be done using bit plane slicing. By isolating particular bits of the pixel values in an image, we can often highlight interesting aspects of that image. Higher-order bits usually contain most of the important visual information. Lower-order bits have subtle details.

Bit Plane Slicing for an 8 bit image

The image below shows the different planes obtained for a CXR, as we can observe the lower bit plane images are not visually informative, whereas the higher ones contain significant information. The lower bit planes have information that is not visually interpretable, but is identified by the DCNN as we will see in the sections ahead.

Different Planes for a CXR

Results and Discussions

Now that we’ve established the fact MobileNetV2 performs the best, we train the MobileNetV2 model on the bit plane sliced images as shown above.

Bit Plane 0: We observe that the only keeping the least significant bit from each pixel, cause the model to overfit on the data as there isn’t enough visual information to learn from. The train accuracy is about 0.94 and the validation accuracy is appropriately 0.84, which validates the hypothesis of that the model overfits the data. Even when considering the loss the validation loss is almost twice the training loss.

Bit Plane 0 (Training and Validation Accuracy and Loss)

Bit Plane 1: An interesting observation regarding this model is that is runs only for 3 epochs, we again implies that the lack of information from this bit plane affects the training of the model significantly as compared to the bit plane 0. The validation accuracy, decreases, and is lesser as compared to the training accuracy. The validation loss is greater than the training loss. Hence, here as well the model overfits on the data due to lack of visual information.

Bit Plane 1 (Training and Validation Accuracy and Loss)

Bit Plane 2: The training done for bit plane 2 is quite peculiar and odd as it reaches an accuracy of 100% however we know it is not possible to attain such great accuracy in three epochs and with less data. I have not been able to come up with an explanation as to why this problem occurs, even after trying to re-train the data after extracting the data one more time, the error repeats itself.

Bit Plane 2 (Training and Validation Accuracy and Loss)

Bit Plane 3: Bit Plane 3 has results similar to Bit Plane 0.

Bit Plane 3 (Training and Validation Accuracy and Loss)

Bit Plane 4: As seen by the graph the gap between the validation accuracy and training accuracy decreases, implying that as bit planes increase the level of overfitting decreases. The training and validation loss also follow this trend.

Bit Plane 4 (Training and Validation Accuracy and Loss)

Bit Plane 5: Even though the bit plane 5 model trains for three epochs and overfits on the data, it performs well.

Bit Plane 5 (Training and Validation Accuracy and Loss)

Bit Plane 6 & 7: Both these bit planes overfit on the data, they show patterns similar to the the first two bit planes, however the validation and training losses are quite high in these models. This potentially be because of lack of information.

Bit Plane 6 (Training and Validation Accuracy and Loss)

Bit Plane 7 (Training and Validation Accuracy and Loss)

It would be fair to conclude that individual planes when provided as input to the DCNN overfits on the data. Hence training the entire image without any extraction would be a useful tool, fortunately a lot of work has been many works on the same, listed below in the references section.

Code

The code is written in Python 3, and run on a jupyter notebook, which can be accessed here - CXR COVID.