Optimal image alignment

I have nine PNG images similar to these ones:

I want to align them in a 3x3 grid without padding. Of the tons of possibilities there exist, I want to choose the one where the sum of all edge differences is as small as possible, i.e. for all touching edges, the sum of absolute differences of each edge pixel value is the smallest (assuming all grayscale). Edges at the rectangle's borders don't matter.
Also, it is possible to rotate each image by 180°, changing which edges touch which.

The following illustration shows a minified version of the problem with four images and 6 b/w pixels each. Both are possible alignment options, but the latter has 5 conflicts and the former only 4 (conflict means edge pixels being different in this case).

If I calculated correctly, there would be 9! * 2^9 = 185,794,590 possibilities I would have to check. Is this feasible?

How would I go on to solve this problem? If there already is an algorithm (perhaps from graph theory) that I could use, do you know it and (as a second step) are there crates that implement it? Which crates can be used for accessing the edge pixels?

It's feasible.

There's only a few hundred ways for edges to border, so I would precompute those and then tackle the placement problem.

As a first stab I'd probably go with a backtracking approach: choose some order to fill the grid. For each position, choose an available image plus orientation, and keep track of the partial weight that results so far, then recurse. At the bottom you'll have a full grid; also keep track of the minimum total weight (and which images it resulted from).

Then you backtrack and choose a new image for the second to last position, and then consider the last position again. And once you run out of choices for the second to last position, you return to the position before that. This is like traversing a giant tree. However, you need not traverse the entire tree: any time your partial weight exceeds the minimum total weight you've seen (or even better, you know there's no possible combinations in the subtree that wouldn't exceed the minimum given your partial weight), you can skip recursing.

You mentioned 180 rotation but not vertical or horizontal mirroring, was that intentional?

If so, you can immediately cut the number of combinations to consider in half by noting that rotating the composed grid itself by 180 doesn't change which edges border each other, while also inverting which individual images have been rotated. So for example, you can restrict yourself to never rotating more than 4 images when searching for the minimum.

If not, there's more possible combinations, but also more symmetries.

image is probably the most straightforward crate to read in the image data.


I believe that evaulating all ~200_000 combinations might be feasible. Note that the number of possible boundaries is small, less than 36*35 . (You have 9 * 4 edges.)
So, first compute the boundaries, than evaluate each combination.

Yes, that was intentional. They are real physical images that I can't mirror right now (although that might be an option in the future).

So, the edge-counting looks pretty straight-forward now. I could iterate over all pairs of distinct images (and rotated ones), calculate their edge deltas for each edge and collect it into a HashMap<(Image, Image, EdgeCollision), f64> where Image would be contain an image ID and a bool for whether it's been rotated and EdgeCollision would be an enum of LeftRight, RightLeft, TopBottom and BottomTop.

But I still don't really get the backtracking part. I read the Wikipedia page and understand how it could be used to model sudoku, but how is it applicable to my scenario? In sudoku, there are "hard conflicts" that just aren't allowed, which is when you can backtrack until the error is resolved. My problem is, on the other hand, an optimization problem, and I don't see how that would be compatible with the outlined algorithm scheme.

I'm really struggling to visualize this or the structure of the resulting code. Let's say I chose the first one for every choice I had to make. I'm now at the bottom of the tree. How would I recursively go up? I don't have a real tree with nodes and parents after all (although the subtree optimization part definitely makes sense). Could you maybe show a small (pseudo)code example on how I could do this?

How did you get to that number? I think the number of unrotated combinations would be 9! = 362,880. Factoring in the rotations, without optimizations I could flip each image individually or not, resulting in 9! * 2^9 = 185,794,560 combinations. As @quinedot correctly stated, I could however exclude all combinations that have more than 4 images flipped. This results in
9! * 2^8 = 92,897,280

Oh, sorry. Missed some zeros. Should be 200_000_000. I believe that evaulating all those might be actually feasible, because the underlying computation seems super simple. The only simplification I can see is the following one, which I believe is pratically mostly irrelevant, at least for your samples.
Track the current minimum. If a later configuration goes above it already on thr say 3 image, you can stop there.

1 Like

FYI the name for what you're describing is "Branch and Bound"


Yes, that's reasonable. The numbers are so small here it doesn't really matter, but the same symmetry optimization is possible (A left of B has the same weight as rot(B) left of rot(A), etc) and you only have to actually calculate half of those.

Unless your bit depth or images are massive, u32 is probably enough to store the weights.

Let's say you picked a spiral pattern starting in the middle; at each step you add an image with a certain number of new bordering edges:

  • 0, 1, 1, 2, 1, 2, 1, 2, 2

Let's say you've just placed the sixth image and your partial weight is now P. You also know the minimum solution you've seen so far [1], M. You also know that you have 5 edges left to add. From calculating weights earlier, you know that 5 edges adds at least S[5] to the weight.

If P + S[5] > M, there's no point in continuing to place images: this configuration of six images can't possibly result in a lower weight, no matter how you place the last three. (And if it's equal you only need to continue if you want all minimal solutions, in contrast with just any minimal solution.)

You keep a stack of choices and/or use recursion to keep state on the call stack. You push on the stack to go down the tree and pop to go up. The branches at each level are which image you choose to place.

Here's an example.

If you changed the minimum tests to <= instead of <, this example will run through all permutations (modulo the symmetry check). That takes under 15 seconds on my laptop, so like @MichaelV has said you could just forgo more optimization and brute force it.

  1. initialized to something too big to be possible ↩︎

1 Like

I couldn't help myself and implemented a faster version based on TAOCP Volume 4A. This runs the complete tree in less than 3 seconds on my laptop and is fast enough to complete on the playground. The main TODO is on line 145 and see line 168 if you want to see how long the entire traversal (9!, 256 times) takes on your hardware. Also would be easy to parallelize (see line 119), though every task would have to find it's own minimum instead of sharing a global minimum (i.e. less subtree pruning).


Wow, this is great! Thank you for the work!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.