Image Classification Model Documentation

This document outlines the development pipeline for an image classification model built using a convolutional neural network (CNN) with TensorFlow/Keras. The model is designed to classify images into two categories, leveraging standard deep learning techniques for data cleaning, preprocessing, model architecture, training, and evaluation. Additional details provide context, best practices, and insights into image classification.

Data Cleaning
Data Preprocessing
Model Architecture
Training
Evaluation
Best Practices and Future Improvements

Data Cleaning

The dataset images were cleaned to ensure data integrity before training. A Python script verifies each image file using the PIL library’s Image.verify() method, removing corrupted or unreadable images. This process was applied to the train, test, and validation datasets.

Why Data Cleaning Matters: Corrupted images (e.g., truncated files, incorrect formats) can cause errors during training or degrade model performance. Cleaning ensures robustness and consistency in the dataset. The script below checks for corrupted images and logs the number of files removed.

    
    
import os
from PIL import Image

def clean_the_data(files):
    cleaned = 0
    for temp, _, file in os.walk(files):
        for f in file:
            paths = os.path.join(temp, f)
            try:
                img = Image.open(paths)
                img.verify()  # Verify image integrity
            except Exception as e:
                print(f"Removing corrupt image: {paths} ({str(e)})")
                os.remove(paths)
                cleaned += 1
    print(f"Cleaned {cleaned} corrupt images")
    return cleaned

# Example usage
clean_the_data("dataset/train")
clean_the_data("dataset/valid")
clean_the_data("dataset/test")

Additional Considerations:

Format Consistency: Ensure all images are in compatible formats (e.g., JPEG, PNG) to avoid preprocessing errors.
Resolution Checks: Filter images with extreme resolutions to maintain uniformity after resizing.
Label Verification: Confirm each image has a correct label, especially for supervised learning tasks.
Metadata Handling: Remove or standardize EXIF data to prevent unintended biases (e.g., camera-specific artifacts).

Note: For large datasets, consider parallelizing the cleaning process using multiprocessing to improve efficiency.

Data Preprocessing

Images were resized to 64x64 pixels and batched for efficient training using TensorFlow’s image_dataset_from_directory method. This approach leverages directory structures for automatic labeling and supports batch processing for GPU optimization.

    
    
import tensorflow as tf

img_size = (64, 64)
batch_size = 32

trained_data = tf.keras.preprocessing.image_dataset_from_directory(
    "dataset/train",
    seed=42,
    image_size=img_size,
    batch_size=batch_size,
    label_mode='int'  # For sparse categorical crossentropy
)

valid_data = tf.keras.preprocessing.image_dataset_from_directory(
    "dataset/valid",
    seed=42,
    image_size=img_size,
    batch_size=batch_size,
    label_mode='int'
)

tested_data = tf.keras.preprocessing.image_dataset_from_directory(
    "dataset/test",
    seed=42,
    image_size=img_size,
    batch_size=batch_size,
    label_mode='int'
)

Visualization: Sample images from the training set were visualized to verify correct loading and labeling. This step is critical to detect issues like mislabeled data or incorrect preprocessing.

Enhanced Preprocessing:

Data Augmentation: To improve model generalization, apply techniques like random flips, rotations, and brightness adjustments. Example:

    
    
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip("horizontal"),
    tf.keras.layers.RandomRotation(0.1),
    tf.keras.layers.RandomZoom(0.1),
])

trained_data = trained_data.map(lambda x, y: (data_augmentation(x, training=True), y))

Normalization: Scale pixel values to [0,1] or standardize them (mean=0, std=1) to stabilize training.
Prefetching/Caching: Optimize data loading with trained_data.prefetch(tf.data.AUTOTUNE) to reduce I/O bottlenecks.
Class Imbalance: Check for imbalanced classes and apply oversampling, undersampling, or class weights if needed.

Model Architecture

The CNN model was built using Keras’ Sequential API, designed for binary image classification with the following layers:

4 convolutional layers with increasing filters (32, 64, 128, 256), each followed by max pooling
Flatten layer to convert 2D feature maps to 1D
Dense layer with 256 neurons and ReLU activation
Dropout layer (40% rate) to prevent overfitting
Output dense layer with 2 neurons and softmax activation

    
    
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(64,64,3)),
    MaxPooling2D(2,2),
    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Conv2D(128, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Conv2D(256, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    Flatten(),
    Dense(256, activation='relu'),
    Dropout(0.4),
    Dense(2, activation='softmax')
])

model.summary()

Architecture Insights:

Convolutional Layers: Extract spatial features like edges and textures. Increasing filters capture more complex patterns.
Max Pooling: Reduces spatial dimensions, lowering computational cost and preventing overfitting.
Dropout: Randomly deactivates neurons during training to enhance generalization.
Softmax: Outputs probabilities for the two classes, suitable for binary classification.

Improvements:

Add batch normalization after convolutional layers to stabilize training: BatchNormalization().
Use global average pooling instead of Flatten to reduce parameters: GlobalAveragePooling2D().
Experiment with architectures like ResNet or EfficientNet for better performance.

Training

The model was compiled with the Adam optimizer and sparse categorical crossentropy loss, then trained for 5 epochs on the training dataset with validation on the validation dataset.

    
    
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

history = model.fit(
    trained_data,
    validation_data=valid_data,
    epochs=5,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(patience=2, restore_best_weights=True)
    ]
)

Training Details:

Adam Optimizer: Adapts learning rate for faster convergence.
Loss Function: Sparse categorical crossentropy is suitable for integer-labeled multi-class problems.
Early Stopping: Added to prevent overfitting by halting training if validation loss stops improving.

Monitoring: Training and validation accuracy/loss were plotted to diagnose issues like overfitting or underfitting. Example visualization code:

    
    
import matplotlib.pyplot as plt

plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

Evaluation

The trained model was evaluated on the test dataset, achieving approximately 96% accuracy and 12.4% loss.

    
    
test_loss, test_acc = model.evaluate(tested_data)
print(f"Test Accuracy: {test_acc:.5f}")
print(f"Test Loss: {test_loss:.5f}")

Evaluation Insights:

High Accuracy: 96% suggests good generalization, but further analysis (e.g., confusion matrix) is needed to confirm performance across classes.
Loss Interpretation: A loss of 0.124 is reasonable but could be reduced with longer training or model tweaks.

Additional Metrics: Compute precision, recall, and F1-score to assess performance, especially for imbalanced datasets.

    
    
from sklearn.metrics import classification_report
import numpy as np

y_pred = []
y_true = []
for images, labels in tested_data:
    preds = model.predict(images)
    y_pred.extend(np.argmax(preds, axis=1))
    y_true.extend(labels.numpy())

print(classification_report(y_true, y_pred, target_names=['Class 0', 'Class 1']))

Best Practices and Future Improvements

Best Practices for Image Classification:

Dataset Quality: Use high-quality, diverse images and ensure balanced class distribution.
Hyperparameter Tuning: Experiment with learning rates, batch sizes, and epochs using grid search or random search.
Transfer Learning: Leverage pre-trained models (e.g., VGG16, ResNet50) for better performance with limited data.
Model Interpretability: Use techniques like Grad-CAM to visualize which image regions influence predictions.
Deployment: Optimize the model for inference (e.g., quantization, TensorFlow Lite) if deploying on edge devices.

Future Improvements:

Increase epochs or use learning rate scheduling to improve convergence.
Implement k-fold cross-validation for robust performance estimation.
Explore advanced architectures like Vision Transformers (ViT) for potentially better accuracy.
Add automated hyperparameter tuning with tools like Keras Tuner.