Thursday, April 3, 2025

What are the most effective strategies for picture classification when dealing with small datasets using Keras?

What’s the best way to coach a convolutional neural network (CNN) when you’re working with a limited dataset? One approach is to start by augmenting your training data. This means generating new images from your existing ones, perhaps through rotation, flipping, or adding random noise. This can help prevent overfitting and ensure that your model learns to generalize better to unseen examples.

Frequently, professionals may find themselves tasked with training an image-classification model using minimal or no available data, a scenario that can arise when working with computer vision in a real-world setting. “A handful of examples might suggest anywhere from dozens to potentially thousands of photographs.” We will examine a practical example by categorizing images into dogs or cats within a dataset comprising 4,000 photographs, with equal proportions of 2,000 cat and dog images. For our project, we will utilize a total of 2,000 photographs, allocating 1,000 for validation purposes, and another 1,000 for testing.

In Chapter 5 of the eBook, we provide a comprehensive overview of three effective methodologies for addressing this challenge. Coaches train a small figurine from the ground up using scarce data, thereby yielding an impressive 82% level of precision. Subsequently, we utilize our methods with leading accuracy of 90% and achieve a final accuracy of 97%. On this post, we will cover only the second and third methods.

Deep learning’s supremacy in tackling small-data problems lies in its capacity to extract meaningful patterns and relationships from limited datasets. By leveraging powerful neural networks, deep learning algorithms can successfully identify subtle connections that may have escaped traditional machine learning methods.

Deep learning typically thrives in environments with abundant data availability. The legitimacy of self-discovery in deep learning lies in its ability to unearth intriguing patterns within training data itself, without requiring manual feature engineering – a phenomenon that can only be unlocked by having a substantial number of available training examples. In cases where input data are extremely high-dimensional, such as images, this phenomenon holds particularly well.

While a sufficient number of coaching examples remains subjective, it’s crucially dependent on the scope and complexity of the community being trained. While it’s unrealistic to expect a convolutional neural network (CNN) to resolve a complex issue solely based on a few dozen data points, a modest number of several hundred samples could potentially be sufficient if the model is compact and well-regularized, and the task is relatively straightforward. Given the advancements in deep learning, convolutional neural networks (CNNs) have been equipped with native translation-invariant properties, making them exceptionally eco-friendly for handling perceptual tasks within resource-constrained environments. Despite being applied to a tiny dataset, coaching a convolutional neural network from scratch can still produce satisfactory results, without requiring elaborate customised feature engineering or extensive domain expertise. This phenomenon will become apparent as we proceed.

Deep-learning models’ inherent flexibility lies in their capacity for swift adaptation; they can effortlessly repurpose themselves to tackle novel challenges, such as transitioning from image classification to speech-to-text applications, with minimal adjustments required. Pretrained models, frequently trained on the vast ImageNet dataset, have become widely available for download, enabling developers to quickly build powerful computer vision models with minimal data requirements. What steps will you take in the following section? Let’s get our hands on the information.

Downloading the information

The Canine vs. Cats dataset, which you will use, is not pre-packaged with Keras. The dataset was released on Kaggle as part of a computer-vision competition in late 2013, during a time when convolutional neural networks were not yet widely adopted. To acquire the distinctive dataset, you will need to establish a Kaggle account if you do not already possess one; rest assured that the process is straightforward.

The photographs are stored as medium-resolution, shaded JPEG files. Listed below are some examples:

The winning entries in the 2013 Kaggle competition pitting dog enthusiasts against cat aficionados leveraged the power of convolutional neural networks (convnets). Entries that were deemed perfect attained an impressive accuracy rate of nearly 95%. You’ll discover yourself with an impressive 97% accuracy, having honed your skills through practice on less than 10% of the available information compared to your opponents.

The dataset consists of approximately 25,000 images equally divided between canine and feline subjects, with a total compressed size of 543 megabytes. Upon extracting the compressed file, you will generate a novel dataset comprising three distinct subsets: a training cohort featuring 1,000 instances from each category, a validation cohort consisting of 500 examples from each category, and a test cohort comprising 500 instances from each category.

What would you like to improve?

 

Utilizing a pretrained convnet

One effective approach for conducting deep learning on compact image datasets is to leverage pre-trained networks. A pre-trained model is a saved community that was initially trained on a vast dataset, frequently utilized for large-scale image-classification tasks. If this dataset is large enough and representative enough, then the spatial hierarchy of options discovered by the pre-trained model can successfully serve as a generic framework for the visual world, thereby rendering its outputs applicable to a wide range of computer-vision tasks, including those involving novel classes that may differ from the original activity. You can practice a community on ImageNet, comprising mostly animal and everyday object classes, and then retrain this skilled model to identify furniture objects in images. The unparalleled portability of insights gleaned from deep learning applications across diverse problem domains is a significant advantage over traditional, surface-level machine learning methods, thereby rendering deep learning an extremely effective solution for tackling data-scarce challenges.

Let’s consider a well-trained convolutional neural network expert on the ImageNet dataset, comprising approximately 1.4 million labeled photographs and 10,000 distinct classes. ImageNet’s vast repository of animal images includes various species of felines and canines, making it well-suited for tackling classification challenges like dogs versus cats.

Utilizing the VGG16 architecture, conceived by Karen Simonyan and Andrew Zisserman in 2014, this convolutional neural network (convnet) structure has proven to be an effective and widely employed solution for ImageNet applications. Although this vintage mannequin may be outdated and significantly heavier than modern counterparts, I chose it for its familiar structure, which parallels the conventional understanding you’re already accustomed to, thereby minimizing the need to introduce novel concepts. Here’s the improved text:

Your first encounter with these cutesy mannequin names – VGG, ResNet, Inception, Inception-ResNet, Xception, and so forth – may be a peculiar experience; however, you’ll quickly become accustomed to them as they will recur frequently when pursuing deep learning for computer vision.

Two common ways to leverage pre-trained communities include fine-tuning the model on your specific dataset and using transfer learning. We’ll cowl each of them. Let’s begin with characteristic extraction.

Characteristics are extracted by leveraging previously obtained representations, enabling the identification of captivating patterns in novel data. The options are subsequently processed through a freshly trained classifier, initialized from scratch.

Convoluted neural networks employed in image classification tasks typically consist of two primary components: an initial series of convolutional and pooling layers that precede a densely connected classifier. What is known about the back side of the mannequin? In the context of convolutional neural networks (CNNs), feature extraction involves feeding novel data through a pre-trained convolutional base model, followed by training a new classifier on top of the extracted features.

Why not consider a hierarchical approach by reusing the convolutional base in combination with other architectures to leverage their respective strengths? Can you effectively reapply the dense network-based classifier? It is crucial to prevent such actions from taking place normally. The representations gleaned from the convolutional base tend to be surprisingly generic and consequently remarkably reusable: characteristic maps of a convnet serve as presence maps of abstract concepts over an image, rendering them highly beneficial across diverse computer vision applications. The representations learned by the classifier will be specific to the training dataset and will only capture information about the likelihood of a certain class being present in an image. Moreover, representations in densely interconnected layers do not provide any information regarding the positioning of objects within the entire scene; these layers dispense with the concept of spatial relationships, while article locations are still defined through convolutional feature maps. For resolving location-related issues, standalone solutions often prove insufficient.

The extent of generality and subsequent reusability of representations obtained from specific convolutional layers depends on their depth within a given architecture, with deeper layers generally exhibiting more abstract and reusable features? Early layers of a convolutional neural network yield generic characteristics, such as visible edge, colour, and texture mappings, akin to those observed in raw image data. Conversely, higher-level layers distill abstract concepts, exemplified by features like “feline ear” or “canine ocular structure”. When your novel dataset diverges significantly from that used for training the pre-trained model, it’s often more effective to leverage only the initial layers for feature extraction, rather than relying on the entire convolutional backbone.

Given the pre-trained ImageNet model’s comprehensive classification capabilities for canine and feline breeds, it is likely beneficial to leverage the distilled knowledge embedded within its intricately connected neural networks. When selecting to ignore, we aim to cover the straightforward case where the problem’s category set does not overlap with that of the original model.

Utilizing the pre-trained convolutional base of VGG16, renowned for its proficiency in classifying images on ImageNet, we leverage this expertise to extract relevant features from photographs of cats and canines. Following feature extraction, we train a dogs-versus-cats classifier atop these extracted features.

The VGG16 model, along with other architectures, is bundled with the Keras deep learning framework.

The following is a list of pre-trained image-classification models, all of which are available in Keras and were trained on the ImageNet dataset.

  • Xception
  • Inception V3
  • ResNet50
  • VGG16
  • VGG19
  • MobileNet

Let’s instantiate the VGG16 mannequin.

 

The analysis is performed on three arguments.

  • weights Determines the specific load checkpoint used to initialize the mannequin’s state.
  • include_top Collaborates closely with a densely linked classifier, situated atop the community. By convention, this complexly interconnected model is aligned with the 1,000 classes derived from ImageNet’s vast dataset. Since you aim to utilize your custom-built intensely interconnected classifier, comprising merely two training sessions: cat and canineYou don’t want to incorporate it?
  • input_shape Are the forms of the picture’s tensors that one simply feeds to the community? The decision is purely optional: if you choose not to engage, the community will still be able to process inputs of any magnitude.

Here’s the fundamental component of the VGG16 convolutional base structure. As with traditional convolutional networks, this architecture’s simplicity and efficiency make it a compelling choice.

Layer Sort Output Form Param # ================================================================ 1. Input Layer (None, 150, 150, 3)         0 2. Convolutional Block 1:    - Convolutional Layer 1 (None, 150, 150, 64)     1792    - Convolutional Layer 2 (None, 150, 150, 64)    36928    - Max Pooling Layer (None, 75, 75, 64)          0 3. Convolutional Block 2:    - Convolutional Layer 1 (None, 75, 75, 128)     73856    - Convolutional Layer 2 (None, 75, 75, 128)    147584    - Max Pooling Layer (None, 37, 37, 128)          0 4. Convolutional Block 3:    - Convolutional Layer 1 (None, 37, 37, 256)     295168    - Convolutional Layer 2 (None, 37, 37, 256)    590080    - Convolutional Layer 3 (None, 37, 37, 256)    590080    - Max Pooling Layer (None, 18, 18, 256)          0 5. Convolutional Block 4:    - Convolutional Layer 1 (None, 18, 18, 512)     1180160    - Convolutional Layer 2 (None, 18, 18, 512)    2359808    - Convolutional Layer 3 (None, 18, 18, 512)    2359808    - Max Pooling Layer (None, 9, 9, 512)            0 6. Convolutional Block 5:    - Convolutional Layer 1 (None, 9, 9, 512)      2359808    - Convolutional Layer 2 (None, 9, 9, 512)      2359808    - Convolutional Layer 3 (None, 9, 9, 512)      2359808    - Max Pooling Layer (None, 4, 4, 512)            0 Complete params: 14,714,688 Trainable params: 14,714,688 Non-trainable params: 0

The ultimate characteristic map takes on a distinctive form. (4, 4, 512). The primary feature upon which you’ll rely to attach a densely connected classifier?

At this juncture, there exist two feasible approaches to consider.

  • Processing the convolutional base on the provided dataset and saving its outputs to a file, then feeding this data into a separate, fully connected classifier similar to those introduced in part one of this guide. As this solution is remarkably efficient, leveraging only one forward pass through the convolutional base per input image, which happens to be the most computationally expensive component in the overall pipeline. Despite being applied for a similar purpose, this approach won’t facilitate the utilization of information augmentation effectively.

  • You can extend the life of your mannequin byconv_baseBy incorporating densely layered primes and processing the entirety of the data from start to finish. This may facilitate the utilization of information augmentation, resulting from each input image passes through the convolutional block each time it’s encountered by the model. While this approach may serve a similar purpose, its significantly higher cost renders it a less viable option compared to the initial one.

On this post, we will cover the second approach in detail (in the book, we cover both). While this method is indeed computationally expensive, requiring access to a Graphics Processing Unit (GPU) for feasibility, it’s crucial to note that attempting to execute it solely through a Central Processing Unit (CPU) would be futile due to its inherent computational complexity.

As a result of fashion’s layered behavior, you’ll be able to dress up a mannequin like a puzzle. conv_baseYou’d stack another prototype on top of an existing one.

 

That’s what the mannequin appears to be like currently.

Layer (Sorted by Type)                     Output Form          Parameter Count   ================================================================ Convolutional Layers:   vgg16 (Mannequin)                    (None, 4, 4, 512)     14714688                                        flatten_1 (Flatten)              (None, 8192)          0         Dense Layers:   dense_1 (Dense)                  (None, 256)           2097408     dense_2 (Dense)                  (None, 1)             257       ================================================================ Total Parameters: 16,812,353 Trainable Parameters: 16,812,353 Non-Trainable Parameters: 0

With a staggering 14,714,688 parameters, the convolutional base of VGG16 is undoubtedly massive in scope. The classifier included in the Prime system boasts an impressive 2 million parameters.

Before compiling and practicing the model, it is essential to freeze the convolutional base. A layer or set of layers freezing their weights means halting the updates during training. If training does not occur in this scenario, the representations previously learned by the convolutional base will be updated during training. Randomly initializing the dense layers on a Prime network can lead to massive weight updates being propagated through the community, effectively undoing any previously learned representations?

In Keras, you freeze layers of a model using the `tf.stop_gradient` function. freeze_weights() perform:

[1] 30
 
[1] 4

Solely the weights from these two densely connected layers are likely to be trained. The revised text is:

This consists of four weight tensors: one for each layer’s principal weight matrix and bias vector. To implement these changes effectively, compiling the model is a prerequisite step. When modifying weight trainability after compilation, it is crucial to recompile the model; otherwise, these changes will be disregarded.

Utilizing information augmentation

Overfitting arises when the limited number of training samples renders it impossible to train a model that can effectively generalise to novel data? With unlimited access to training data, your model can be trained on every possible aspect of the information distribution at hand: never experiencing overfitting again? Data augmentation employs a technique to generate additional training data from existing datasets through various randomized manipulations that create realistic-seeming images. At coaching time, a unique and varied visual landscape is ensured by the mannequin never seeing the exact same image twice. This exposure enables the mannequin to consider additional dimensions of the data, fostering more comprehensive understanding and generalization capabilities.

In Keras, this process is completed by specifying multiple random augmentations that are applied to the images as they are being trained upon. image_data_generator(). For instance:

 

Among the limited options available (for more information, refer to the Keras documentation), these are just a few. Let’s review the existing logic in a concise manner.

  • rotation_range Are you looking for an algorithm to randomly rotate images within a specific range of angles (0–180 degrees)?
  • width_shift and height_shift Are specific percentage-based parameters for introducing random vertical and horizontal translations within predefined boundaries?
  • shear_range Are individuals prone to utilizing unpredictable shearings transformations at random?
  • zoom_range Are you looking to create a unique experience by allowing users to randomly explore and discover hidden moments within your photographs?
  • horizontal_flip Is randomly flipping half the photographs horizontally a meaningful transformation when no assumptions exist about horizontal asymmetry, as is often the case with real-world images?
  • fill_mode The technique employed to fill in newly generated pixels, potentially appearing following a rotation or width/peak shift?

The generative picture of a mannequin is refined using novel algorithms.

 

Let’s plot the outcomes. As you will clearly observe, your model achieves a validation accuracy of roughly 90 percent.

Positive-tuning

One widely employed strategy for mannequin reutilization, in tandem with feature abstraction, is
Positive-tuning involves reinitializing a few of the topmost layers in a pre-trained model’s frozen base used for feature extraction, then concurrently training both the newly added component – in this instance, the fully connected classifier – and those upper layers. As a direct consequence of its meager adjustments.
Representations of the mannequin are reutilized with the objective of rendering them more pertinent to the matter at stake.

I ensured earlier that freezing the convolutional base of VGG16 was crucial, allowing for training a randomly initialized classifier atop. When addressing a related issue, there is scope for refining the uppermost levels of the convolutional backbone after the primary classifier has already been trained. Unless the classifier is pre-trained to a high level of proficiency, the propagation of error signals through the network during fine-tuning will be excessive, potentially destroying the previously learned representations in the layers being updated. Thus, the steps for fine-tuning a community are straightforward:

  • Create a tailored online space by building upon the foundation of an existing, well-curated community.
  • Freeze the bottom community.
  • Practice the half you added.
  • Unlock a few layers within the core community.
  • Together refine all of these strata and the ones you incorporated.

The primary steps for characteristic extraction have been successfully completed. You’ll let go of any resistance and surrender to the process, allowing yourself to fully immerse in the present moment. conv_base As temperatures drop to a record-low level, the individual’s skin layers become rigid and inflexible, effectively trapping them within their own frozen shell.

That is what your convolutional base seems to me?

Layer (sort)                     Output Form          Param #   ================================================================ input_1 (InputLayer)             (None, 150, 150, 3)   0         ________________________________________________________________ block1_conv1 (Convolution2D)     (None, 150, 150, 64)    1792      ________________________________________________________________ block1_conv2 (Convolution2D)     (None, 75, 75, 64)      36928     ________________________________________________________________ block1_pool (MaxPooling2D)       (None, 37, 37, 64)      0         ________________________________________________________________ block2_conv1 (Convolution2D)     (None, 37, 37, 128)    73856     ________________________________________________________________ block2_conv2 (Convolution2D)     (None, 18, 18, 128)     147584    ________________________________________________________________ block2_pool (MaxPooling2D)       (None, 9, 9, 128)       0         ________________________________________________________________ block3_conv1 (Convolution2D)     (None, 9, 9, 256)      295168    ________________________________________________________________ block3_conv2 (Convolution2D)     (None, 4, 4, 256)       590080    ________________________________________________________________ block3_conv3 (Convolution2D)     (None, 4, 4, 256)       590080    ________________________________________________________________ block3_pool (MaxPooling2D)       (None, 4, 4, 256)       0         ================================================================ block4_conv1 (Convolution2D)     (None, 4, 4, 512)      1180160   block4_conv2 (Convolution2D)     (None, 4, 4, 512)      2359808   block4_conv3 (Convolution2D)     (None, 4, 4, 512)      2359808   block4_pool (MaxPooling2D)       (None, 2, 2, 512)      0         ================================================================ block5_conv1 (Convolution2D)     (None, 2, 2, 512)      2359808   block5_conv2 (Convolution2D)     (None, 1, 1, 512)       2359808   block5_pool (MaxPooling2D)       (None, 1, 1, 512)      0         ================================================================ Complete params: 14714688

You will fine-tune all the layers from. block3_conv1 and on. Let’s revisit the training process to fine-tune each convolutional block. You possibly can. It’s crucial to consider the following:

  • In the convolutional neural network’s hierarchical structure, lower layers learn abstract, transferable features, whereas higher levels capture more nuanced, task-specific representations. Fine-tuning specialized options is particularly advantageous since they must be adapted for your specific application. Fast decreases in returns are often observed during fine-tuning of later layers.
  • As you introduce more parameters into your model, the more susceptible you are to overfitting? Given the magnitude of the convolutional base’s parameter count, attempting to fine-tune it on a limited dataset would likely prove futile and potentially even detrimental.

Given the current situation, refining just a handful of layers in the convolutional block can be an effective approach. Let’s establish the setup here, building upon where we previously stopped in the initial scenario.

Once you’ve established a solid foundation, you’ll have the freedom to refine your community’s dynamics and culture. You’ll employ the RMSProp optimizer using an extremely low learning rate. The primary justification for employing a low learning rate when fine-tuning the three layers’ representations lies in restricting the scope of the modifications made. Large-scale updates may inadvertently compromise these representations.

 

Let’s plot our outcomes:

You’re observing a significant 6% absolute increase in accuracy, elevating the rate from approximately 90% to an impressive 96% and above.

The loss curve shows no notable improvement and, in fact, appears to be trending negatively. You might be surprised at how accuracy can actually decrease or plateau if the loss isn’t consistently decreasing during training. What appears to be shown is a visualization of the distribution of pointwise loss values; however, what poses an issue for accuracy is actually the distribution itself, rather than just the median value, as accuracy stems from a binary thresholding of the category chance predicted by the model. Although the mannequin may seem unaffected by the surrounding environment, it should continue to evolve and improve regardless of whether its advancements are reflected in the overall loss.

Now you can finally consider this model on the test data.

 
$loss [1] 0.2158171 $acc [1] 0.965

Take a look at this – our accuracy score stands at an impressive 96.5%! Among authentic Kaggle competitors, this dataset had potential to yield outstanding results in its original form. By leveraging state-of-the-art deep-learning techniques, you achieved outstanding results using just a mere 10% of the training data available. The significant difference lies in being able to train on 20,000 samples versus 2,000 samples.

Can convolutional neural networks (CNNs) effectively learn from small datasets? The answer is a resounding yes, provided you follow best practices.

What follows are key takeaways from the past two exercise segments.

  • Convolutional neural networks are widely regarded as the most effective type of machine-learning architecture for computer vision tasks. With minimal data, it’s feasible to train a model from the ground up and still achieve impressive results.
  • On smaller datasets, the primary concern is often overfitting. Information augmentation is a powerful technique for combating overfitting when dealing with image data?
  • Reusing a pre-trained convolutional neural network (CNN) on a novel dataset is simplicity itself when leveraging feature extraction techniques. This efficient methodology proves particularly valuable when handling limited image data sets.
  • To augment characteristic extraction, consider incorporating fine-tuning, a technique that leverages existing models’ representations by adapting them to address novel challenges. This slight enhancement boosts efficiency further.

You now possess a robust suite of tools for tackling image-classification challenges, especially those involving limited datasets.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles