An Algorithm for "Visually Appealing" Image Ordering
- When displaying a set of images, some orders are more interesting than others.
- One way to create an interesting sorting order of images is to make sure adjacent images are dissimilar.
- You can use computer vision to determine dissimilarity between images.
- You can then use a genetic algorithm to globally optimise the sorting order of images for most dissimilarity between adjacent images.
Imagine you have a number of images that you care to show to a user, either using a slider or some other layout. Whatever the layout, there is some inherent ordering; people don't look at a screen all at once and op top of that scrolling or swiping may be involved.
You care to explicitly order images in a way that is fit for purpose. In some situations there is a well defined order that is known to work well. Product image galleries commonly present overviews before details, for example. For other situations, such an implied order does not exist. Here you want to optimise for making it look "more interesting".
Take the following set. It has three dogs, three cars, three boats, and a floating van. (Image credits are at end of the article.)
The default ordering of the images above neatly sorts them so all objects of the same class are adjacent (first the boats, then the cars, then the dog, finally the van). There are two good reasons for mixing things up a bit:
- It will look more appealing (citation needed, but you can judge for yourself below).
- Increasing the entropy of what is presented yields stronger preference signals when a user decides to click (citation needed, because I can't really talk about places where we actually tested this and it worked).
So, here is the official friso.lol method for ordering images in a visually appealing way: optimise the order of a list of images for maximum visual dissimilarity between adjacent images.
Now we have two problems: 1) define visual dissimilarity, and 2) find the optimal order of a list based on some global property of the list. The first is subjective and the second is NP-hard.
Note that this is a condensed technical article, so I will blatantly assume familiarity with all tools and techniques mentioned. It is not about how to do this, but about that it can be done in this way using an ensemble of existing techniques. Nonetheless, the actual code is provided below.
The problem of image dissimilarity sounds like a computer vision problem, so you will need to use deep learning in order to be taken seriously and keep your job. Luckily, deep learning models are very good at projecting images into a vector space. And for vector spaces we have several adequate definitions of distance, which loosely translate to dissimilarity. Here is the playbook:
- Use an existing, pre-trained image classification network, with the top layer removed and use the output as a vector representation of the input image.
- Define the dissimilarity between two images as the cosine distance between their vector representations.
For the set above, each image is represented as the 512 feature vector obtained using a VGG16 model pre-trained on imagenet, without the top layer. Here is the resulting pairwise cosine distance matrix. The original image set is included again just below for convenient reference.
You can see that the light patches are distances between images that belong to the same class. Also, the floating van is more dissimilar from dogs than from cars or boats. Finally, the car photographed from the rear looks less car-like than the other two cars and actually bears some (very) remote resemblance to a boat.
Optimising Image Order for Appeal
When ordering the images, you want adjacent images in your presented order to be dissimilar. That means that jumping from one image to the next, there is a noticeable visual difference between the two. You do not want two very similar looking dogs to be next to each other.
To optimise for this goal, maximise the mean distance between adjacent images in your order. The search space of possible orders is the factorial of the number of images, so you can not realistically brute force this for more than a handful of images. Genetic algorithms to the rescue! Here is the next playbook:
- Define the mean distance between adjacent images as fitness function for a given order.
- Run a genetic algorithm to optimise the image order.
Here is one possible order obtained using this method: (click to open in a new tab)
Take note that the thing that could be either a car or a boat is placed next to a dog. The same happened to the car that is not as car-like as the others. All else is nicely alternating.
For comparison, here is one possible order obtained by flipping the optimisation target to minimise instead of maximise. This is a purposely created boring order according to the same method: (click to open in a new tab)
You can open both in a separate tab and switch between tabs to experience the difference.
No particular tuning was done on the genetic algorithm. The initial population is randomly sampled from permutations of orders and has 200 individuals. The crossover (mating) function is a Partially Mapped Crossover (PMX). Each generation the top 50 individuals survive. Termination occurs fixed after 30 generations without any other criteria.
Where Is the Code?
Q: What if my images are not evenly divided across classes or all the same class?
This approach will still optimise for distance between adjacent images. In practice, it tends to still distinguish quite well between visually disparate instances of the same class.
Optimised example with only dogs: (click to open in a new tab)
Boring example with only dogs: (click to open in a new tab)
Optimised example with only shirts: (click to open in a new tab)
Boring example with only shirts: (click to open in a new tab)
Q: What if my images are out of the domain of imagenet and the extracted features do not make sense?
You could try to use a deeper layer of the network, aiming at a lower abstraction of visual features. But better is to construct a ground truth within your own domain and transfer learn the network.
Q: Why did you choose mean distance between adjacent images as fitness function?
Because it is simple and produces reasonable results. I guess that for some distributions of pairwise distances, it might be problematic.
Q: Why not just randomise the order?
For many real world distributions of pairwise distances, the boring orders vastly outnumber the appealing ones.
Image Credits and Full View
(See what I did there?)