I recently found myself playing a small online game. The game has some tedious aspects such as resources gathering activities that I just can’t be bothered to spend my limited time doing. So naturally I looked for ways to create a little bot to automate it.

Not wanting to mess around with figuring out network payloads or attaching to the processes I opted for a non-invasive, GUI automation approach. The idea was to take screenshots of the game, process the image to find the desired resources, move to and gather it. I know there are already a lot of great packages out there that do image recognition like OpenCV or pyautogui. However I decided to roll my own implementation because, to quote Richard Feynman, “What I cannot create, I do not understand”.

The technique I went with was a simple template matching one where you take a template image, called the needle and look for it in a screenshot, the haystack.

Here is an example screenshot taken from the game (our haystack).

We are then looking for all the mine-able nodes (the needle).

The process to find it is quite simple:

  1. Starting at coordinates 0, 0 grab a rectangle of pixels the size of the needle image.
  2. Compare the needle image to the current rectangle of pixels selected.
  3. If the images are a match then add it to the set of found rectangles.
  4. Increment your coordinates and repeat step 1 until all the pixels in the image have been searched.
  5. Return all the matched rectangles in the haystack.

The magic is in determining a match between the selected region and the needle image. There are a lot of different algorithms out there for doing this, but because this game doesn’t deal with things like varying light intensity I was able to use a simple sum of absolute differences to compare the images.

After wiring everything up it was working… slowly.

The problem is that the haystack image can be quite large so scanning the entire image and comparing it against the needle for a match is quite time consuming. We need to do image width x height comparisons. The solution to speeding this up is by using an image pyramid. You basically down sample your haystack and needle images a successive number of times to get smaller and smaller images (I downsized by half for each level). You then start by looking at the smallest image. If you find a match in the small image then you search only that matched region in the next layer. You continue this until you get to the final image. This speeds things up significantly.

Without using image pyramids the time it takes to check a 1045x781px image is ~27 seconds. Using a 3 tier image pyramid (where the smallest image is 261x195) it takes ~0.1 seconds.

Here is a slowed down video of what that search looks like:

After all the image recognition is in place then we can tie it together with the rest of our bot and voilĂ  we have a bot that can gather resource (and handle simple pathing and combat interaction)!

The source code to the image matching can be found here. Be warned it’s untested and rough. This was for fun, not production.