Recently we’ve been exploring different ways to extract features from images using unsupervised machine learning techniques. Typically when wanting to get into deep learning, required the gathering of huge amounts of images which have been classified or annotated so that we feed them into our network in order to train it. Luckily deep learning libraries like Keras come with several pre-trained deep learning models right out of the box, which we can then use to get started with very little effort.
Keras comes with six pre-trained models, all of which have been trained on the ImageNet database, which is a huge collection of images which have been classified into 1000 categories of different objects like cats and dogs. The models are:
Looking at these different choices raises the question: which one of these is going to be the most effective at differentiating between our images? Fortunately for us, this is where Zegami can help.
For our collection we are using images from the Egyptology collection from the National Museum of Antiquities in The Netherlands. This is a really interesting and unique collection of images that is a great test of our feature extraction, mainly because the objects are all from a relatively narrow field, none of which are part of the ImageNet database.
Out of the box Keras comes with a bunch of pre-trained deep learning models https://keras.io/applications/. As mentioned these models have been trained to recognise 1000 different categories from the ImageNet database. However we can also use them to extract a feature vector (a list of 2048 floating point values) of the models internal representation of a category.
To get started with keras we first need to create an instance of the model we want to use. In this example we are using the RestNet50 model.
from keras import applicationsmodel = applications.resnet50.ResNet50(weights='imagenet', include_top=False, pooling='avg')
Here we are setting the weights to ‘imagenet’ which will automatically download the learn parameters from the ImageNet database. The next important arg here is include_top=False which removes the fully connected layer at the end/top of the network. This allows us to get the feature vector as opposed to a classification.
Once initialised the model we can then pass it an image and use it to predict what it might be. However since we don’t want the prediction we instead will get a list of 2048 floating point values.
# load image setting the image size to 224 x 224
img = image.load_img(img_path, target_size=(224, 224))# convert image to numpy array
x = image.img_to_array(img)# the image is now in an array of shape (3, 224, 224)
# need to expand it to (1, 3, 224, 224) as it's expecting a list
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)# extract the features
features = model.predict(x)
# convert from Numpy to a list of values
features_arr = np.char.mod('%f', features)
We first load then convert our image into an array of RGB values. We then pass it into the models predict method to extract our vector.
Once we’ve extracted our feature vector we are then going to use a second unsupervised machine learning technique called dimensionality reduction to take the number of items in our feature vector (dimensions) from 2048 down to 2. By reducing the dimensions down this way we can then easily visualise the relationships between each vector using a scatter plot to identify clusters of similar looking images.
For this we are going to use t-Distributed Stochastic Neighbor Embedding also know as t-SNE which comes with scikit-learn.
Zegami is of course an excellent tool to help us visualise our two dimensions using the scatter plot filter. In the following image I’ve compared the reduced feature vectors of four pre-trained models: RestNet50, InceptionV3, VGG16 and VGG19 using the scatter plot filter.
As you can see there are some interesting differences between each of the four models. We can also use the filter to draw a lasso around each of the clusters within the plot to explore the similarities between the images.
The full example can be viewed at https://demo.zegami.com/the%20national%20museum%20of%20antiquities
Overall using pre-trained models like this is surprisingly effective at differentiating between the different types of objects, despite the fact that it hasn’t been trained on these kinds of images.
The full code used to extract the features and run the t-SNE is available on GitHub.