{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Optimising the Order of a List of Images for Maximum Visual Dissimilarity Between Adjacent Images\n", "\n", "In this notebook we use visual features obtained using a pre-trained deep neural network to determine visual dissimilarity between images. Then we use a genetic algorithm to find an ordering of a list of images that maximises the mean distance between adjacent images. Distance or dissimilarity between images is defined as the cosine distance between the vector representation of an image obtained by pushing the image through a pre-trained neural network." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import tensorflow as tf\n", "import matplotlib.pylab as plt\n", "import evol\n", "import glob\n", "\n", "from scipy.spatial import distance\n", "\n", "from plotnine import *\n", "import plotnine.options" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "plotnine.options.figure_size = (16,9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Image Credits\n", "\n", "The images are loaded from disk in this notebook, but were originally sourced from Flickr.\n", "\n", "Original images are linked here for source and credit:\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "image_files = sorted(glob.glob('../flickr-multi/*'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Image Feature Vectors\n", "\n", "To project images into a feature space, we use an instance of the VGG16 neural network pre-trained on imagenet, with the top layer cut off and avereage pooling applied. This turns arbitrary images into a 512 feature vector.\n", "\n", "We load the images twice, once as is for display and once with preprocessing for the network." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "pretrained_net = tf.keras.applications.VGG16(\n", " include_top=False,\n", " weights='imagenet',\n", " pooling='avg',\n", " input_shape=(224,224,3)\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# The single letter var is apparently Keras' preferred style\n", "input_layer = tf.keras.layers.Input([None, None, 3], dtype = tf.uint8)\n", "x = tf.cast(input_layer, tf.float32)\n", "x = tf.keras.applications.vgg16.preprocess_input(x)\n", "x = pretrained_net(x)\n", "model = tf.keras.Model(inputs=[input_layer], outputs=[x])" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "original_images = [\n", " tf.keras.preprocessing.image.load_img(img_file)\n", " for img_file in image_files\n", "]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "input_images = np.array([\n", " tf.keras.preprocessing.image.img_to_array(\n", " tf.keras.preprocessing.image.load_img(img_file, target_size=(224,224))\n", " )\n", " for img_file in image_files\n", "])" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "TensorShape([10, 512])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "image_vectors = model(input_images)\n", "image_vectors.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pairwise Distance Matrix and Visualisation\n", "\n", "For visual inspection of our distance metric, we create a pairwise distance matrix in a `DataFrame` and visualise using a tile plot with some annotations." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def show_images(images, rows=2):\n", " \"\"\"\n", " Helper function to display a list of images across multiple rows.\n", " \"\"\"\n", " # The double negation is a silly and hard to mentally parse trick to round up integer divison,\n", " # but now you know...\n", " cols = -(-len(images) // rows)\n", "\n", " fig,ax = plt.subplots(\n", " rows, cols, figsize=(16,9), squeeze=False,\n", " gridspec_kw=dict(wspace=0, hspace=0))\n", " for i in range(rows * cols):\n", " ax[i // cols][i % cols].axis('off')\n", " if i < len(images):\n", " ax[i // cols][i % cols].imshow(images[i])\n", " ax[i // cols][i % cols].text(0, 0, str(i), fontsize=22)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "distance_frame = (\n", " pd.DataFrame( # Construct a DataFrame\n", " distance.squareform( # from the square form\n", " distance.pdist(image_vectors, distance.cosine) # of the pairwise cosine distance matrix\n", " # between images' vector representations\n", " )\n", " )\n", " .reset_index() # Use source image as column\n", " .rename(columns={'index':'from_image'})\n", " .assign(from_image=lambda df: df['from_image'].astype('category')) # Turn into categorical\n", " .melt(id_vars=['from_image'], var_name='to_image', value_name='distance') # Un-pivot for plotting\n", ")\n", "\n", "# Add a formatted representation for geom_text()\n", "distance_frame['text_distance'] = distance_frame['distance'].apply(lambda value: '{:.3f}'.format(value))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | from_image | \n", "to_image | \n", "distance | \n", "text_distance | \n", "
---|---|---|---|---|
0 | \n", "0 | \n", "0 | \n", "0.000000 | \n", "0.000 | \n", "
1 | \n", "1 | \n", "0 | \n", "0.522615 | \n", "0.523 | \n", "
2 | \n", "2 | \n", "0 | \n", "0.352597 | \n", "0.353 | \n", "
3 | \n", "3 | \n", "0 | \n", "0.782678 | \n", "0.783 | \n", "
4 | \n", "4 | \n", "0 | \n", "0.785532 | \n", "0.786 | \n", "