Chest X-Ray Classification

AI-based Tool

Introduction

To understand the importance of the features learnt while performing transfer learning in case of Deep Convolutional Neural Networks (DCNNs), I selected the problem of classification of Chest X-Rays into COVID-19 positive and negative. I intend to perform bit plane slicing and provide these bit planes separately as input to the DCNNs for transfer learning.

Dataset

The dataset used for the study can be obtained from Kaggle. The dataset consists of 676 images and 408 images for training and validation respectively belonging to both positive and negative class. The rest of the dataset, that is bit plane sliced images, are obtained manually as explained in the sections later.

Preliminary Work

Initially, the performance of VGG16, InceptionV3, Inception-ResNet, DenseNet121, MobileNetV2, and ResNet101 is measured on the normal dataset to check which model performs the best. The general flow of pre-processing, training and evaluation is described as follows:

  1. Firstly, we perform image augmentation. DCNNs frequently take a substantial amount of training data to attain high efficiency. Image augmentation is a method to increase the performance of DCNNs when creating a robust image classifier with very little training data. Image augmentation generates training images artificially using various processing methods or a mix of techniques, such as random rotation, shifts, shear, flips, etc. Earlier, the augmented images were saved along side the original dataset and provided to the DCNNs for training, however this caused memory constraints, therefore tensorflow introduced the concept of real-time data augmentation as the model trains, which is implemented in this study. This is done for both the training and the validation datasets.

  2. Once pre-processing is complete, we begin to define the hyperparameters for training the model. The batch size refers to number of sample data points given to the model before it’s weights are updated. The batch size is defined as 32 because a higher batch size leads to degradation in the quality of the model as observed by its ability to generalize, whereas a lower batch size would increase the training time. Conceptually, we can think of the learning rate of our model as the step size. A learning rate of 0.0001 as it is the standard learning rate adapted by the community for transfer learning problems. Finally, we define the epochs as 50, which refers to number of times the model is trained on the entire dataset.
    Batch Size      32
    Learning Rate   0.0001
    Epochs          50
    
  3. Lastly, the training of the model begins. Certain criterion are set for training, which include - early stopping, a condition where if a specified metric fails to improve for the duration of a certain number of epochs (defined as patience), the training stops. Reduction of learning rate - a condition where the learning rate is reduced to ensure better training when for a specified number of epochs a certain metric has stopped improving. The metric observed is validation accuracy to avoid the problem of overfitting. Along with this the training and validation accuracy and loss are saved to a csv file for future visualization.
    EarlyStopping Patience      3
    ReductionOfLR Patience      2
    


Accuracy and Validation Accuracy compared for the various models


Loss and Validation Loss compared for the various models

Observations from the graphs:

  1. VGG16: The VGG16 model is unable to improve it’s validation accuracy for three epochs because it potentially fails to learn vitals features that help distinguish the COVID-19 positive and negative. As seen in the graph the training stops after three epochs and within those iterations it is evident that the model overfits on the training data as it is unable to perform adequately on the validation dataset.

  2. InceptionV3, Inception-ResNet, DenseNet121, and ResNet101: The InceptionV3 model has a large variance when validating on unseen data. Other models like Inception-ResNet, DenseNet121, and ResNet101 also have a large difference in the training and validation accuracies and losses.

  3. MobileNetV2: The MobileNetV2 model performs the best among the others. It has the highest training accuracy of about 95% and a validation accuracy of about 91%, while this still indicates overfitting, the extend of overfitting is least in this case. The same holds for losses, the difference in the training and validation loss is the least.

In summary, the MobileNet baseline model exhibited promising performance, with increasing accuracy and decreasing loss as training progressed. The validation accuracy and loss further demonstrated the model’s capability to generalize effectively. These results emphasize the suitability of the MobileNet architecture for the given task and dataset, and indicate the potential for further optimization or fine-tuning to enhance performance therefore now first bit plane slicing is performed and then thereafter train the MobileNetV2 model on the various extracted bit planes.

Bit Plane Slicing

Pixels are digital numbers that are comprised of bits. Instead of emphasizing the gray-level range, we choose to observe each bit’s contribution. It can be done using bit plane slicing. By isolating particular bits of the pixel values in an image, we can often highlight interesting aspects of that image. Higher-order bits usually contain most of the important visual information. Lower-order bits have subtle details.


Bit Plane Slicing for an 8 bit image

The image below shows the different planes obtained for a CXR, as we can observe the lower bit plane images are not visually informative, whereas the higher ones contain significant information. The lower bit planes have information that is not visually interpretable, but is identified by the DCNN as we will see in the sections ahead.


Different Planes for a CXR

Results and Discussions

Now that we’ve established the fact MobileNetV2 performs the best, we train the MobileNetV2 model on the bit plane sliced images as shown above.


Bit Plane 0 (Training and Validation Accuracy and Loss)


Bit Plane 1 (Training and Validation Accuracy and Loss)


Bit Plane 2 (Training and Validation Accuracy and Loss)


Bit Plane 3 (Training and Validation Accuracy and Loss)


Bit Plane 4 (Training and Validation Accuracy and Loss)


Bit Plane 5 (Training and Validation Accuracy and Loss)


Bit Plane 6 (Training and Validation Accuracy and Loss)


Bit Plane 7 (Training and Validation Accuracy and Loss)

It would be fair to conclude that individual planes when provided as input to the DCNN overfits on the data. Hence training the entire image without any extraction would be a useful tool, fortunately a lot of work has been many works on the same, listed below in the references section.

Code

The code is written in Python 3, and run on a jupyter notebook, which can be accessed here - CXR COVID.

References

  1. Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network
  2. COVID-19 identification in chest X-ray images on flat and hierarchical classification scenarios
  3. COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches
  4. Automated detection of COVID-19 cases using deep neural networks with X-ray images
  5. COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images