Transfer Learning

A Brief Experiment

3 min readFeb 24, 2022

Abstract

I demonstrate transfer learning by building a CNN to classify the CIFAR 10 dataset. My architecture consists of DenseNet-121 trained on ImageNet, followed by two dense layers. I achieve over 87% accuracy after 10 epochs.

Introduction

A key prerequisite to achieving a highly accurate neural network is a large amount of labeled data. In the absence of such data, a pre-trained model may be substituted, augmented, and fine-tuned on a smaller dataset. This process is one example of transfer learning. Transfer learning is the application of a NN to a new objective, or the same or similar objective in a new domain.

My objective was to construct a model which achieves ≥ 87% accuracy on the CIFAR 10 dataset.

Materials and Methods

I used DenseNet-121 pretrained on ImageNet as the base for my model, and replaced its output layer with a 10-unit dense layer with softmax activation. I froze the DenseNet-121 model and only trained the dense layer. I kept the final layer trainable under the belief that DenseNet-121 would recognize low- and mid-level features, and those features would be compatible with both ImageNet and CIFAR10, but I expected the interpretation of combined features to be unique between datasets, since classes of objects are different in each dataset.

As an analogy, it might be like teaching someone already reads the Latin script, perhaps an American, to read a foreign language that uses Latin characters, such as Spanish, as opposed to teaching that language to someone who is only familiar with another script, such as Hangul. The American student would only have to learn to recognize Spanish words, as opposed to the Korean student, who would have to learn both the Latin alphabet, and the Spanish words. If there is already an understanding of the Latin alphabet, there’s no reason to teach new characters, and there should be less of a learning curve.

The single layer did not reach over 87% validation accuracy, so I inserted another dense layer of 32 ReLU units prior to the output layer. That allowed the model to learn more complex combinations of features and achieve sufficient accuracy.

The final architecture of my model is as follows, where m is the number of training examples:

Results

After 10 epochs, the model reached a training accuracy of 89.5% and a validation accuracy of 89.78%.

Discussion

This project shows that it can be convenient and useful to repurpose trained models for new tasks. Transfer learning is a helpful tool for solving problems when labeled data is limited.