Advent of Code 2019 in 130 ms: days 11

This is part of a series of posts, starting with Advent of Code 2019 in 130 ms: introduction.

Day 11: Space Police

Day 11 is again mostly about running an Intcode program. My solution runs in 2.2 ms and doesn’t do anything special.

For part 2 I guess it’s debatable whether a proper solution should parse the characters from the generated image. My solution does not, and just prints the image instead, but doing so probably wouldn’t significantly increase runtime - runtimes for the Intcode challenges tend to be dominated by executing the Intcode program.

Day 12: The N-Body Problem

For challenge 12, the main performance trick is to realize that the three dimensions are completely independent of one another, so you can model each dimension separately. Once you find the cycle period for each dimension, the period for all dimensions combined is the least common multiple of the separate periods. In fact, this trick is probably necessary to solve part 2 at all.

Another significant performance boost is that in order to detect a cycle, you don’t need to compare with every previous state, just the very first one. This is because we can’t enter a cycle without already being inside it, which can be seen from the fact that the state update rule is symmetrical in time. With p(i, t) and v(i, t) representing the position and velocity, respectively, of moon i at time t, the update rule going forward in time is:

v(i, t + 1) = v(i, t) + sum(sign(p(j, t) - p(i, t)) for j ≠ i)
p(i, t + 1) = p(i, t) + v(i, t + 1)

If we work backward through time, we have to update the position first and the velocity afterwards, and flip the sign of the gravity term and the velocity term:

p(i, t - 1) = p(i, t) - v(i, t)
v(i, t - 1) = v(i, t) - sum(sign(p(j, t - 1) - p(i, t - 1)) for j ≠ i)

This can be rearranged without changes to:

v(i, t) = v(i, t - 1) + sum(sign(p(j, t - 1) - p(i, t - 1)) for j ≠ i)
p(i, t) = p(i, t - 1) + v(i, t)

which is the same update rule as for forward time, just shifted one step in time. This means that every state can be preceded by only one other state, which means that all cycles extend infinitely both forward and backward in time. This optimization, comparing only against the initial state, cut my execution time from ~240 ms down to ~35 ms - and that was with hash sets, which should in theory make the duplication check take only O(1) time.

The final performance improvement I had missed was that although I was checking for cycles in each dimension separately, I still kept the state for each moon in a package with all three dimensions - and I was therefore still updating all three dimensions even after finding the cycle in two of them. By separating the state into three fully separate vectors, I avoided some more unnecessary work after finding each cycle period which dropped the runtime for my solution to 18 ms.

Day 13: Care Package

Day 13 is also mostly about running an Intcode program. My solution runs in 8.0 ms and, again, doesn’t really do anything fancy. The “AI” for winning the game simply keeps the paddle under the ball at all times.

One small performance trick is that you can ignore most of the output from the game - for part 1 you only need the block tiles, and for part 2 you only need the paddle and ball positions. This saves a little bit of work, but the runtime is still dominated by the Intcode program so it doesn’t make much of a difference.

Day 14: Space Stoichiometry

Challenge 14 is about computing the total materials needed to manufacture a product via tree of unbalanced recipes, and there’s a lot of optimization potential here! My solution runs in 470 μs and works by using the solution for part 1 as a pessimistic estimate of the ORE cost per fuel, then iteratively refining the part 2 solution using leftover materials to reduce the cost for the next iteration.

The solution starts by parsing the puzzle input into a hash map of recipes, mapping the output product to the number of that product and a hash map from ingredients to number of that ingredient (so basically the lines of the puzzle input, but read right-to-left).

recipes:
  FUEL:
    - 1
    - Solid Fuel: 10
      Light Oil: 10

  Solid Fuel:
    - 1
    - Light Oil: 10

  Light Oil:
    - 30
    - ORE: 100

Listing 1: Example recipes map

Next, we set up a hash map of orders, mapping an ingredient name to how much of that ingredient we need to produce. An important note here is that this number can be negative, meaning we have a surplus of that ingredient. We initialize this map with FUEL: 1, as for part 1 we need to compute the total cost of 1 FUEL.

orders:
  FUEL: 1

Listing 2: Initial orders map

Now, we loop as long as the map of orders has at least one entry other than ORE greater than zero. We look up the recipe for that ingredient, compute how many times k we need to manufacture by that recipe to cover the order, subtract k times the number of products from the orders map, and add k times the number of each ingredient to the orders map. When this loop exits, the orders map will contain FUEL: 0, ORE: <part 1 solution>, and a bunch of negative entries representing various surplus intermediate products. This is where the fun starts.

orders:
  FUEL: 0
  Solid Fuel: 0
  Light Oil: -10
  ORE: 400

Listing 3: Orders map after producing 1 FUEL

For part 2, we need to compute how much fuel f we can produce with 1 trillion ORE. We can’t use the above algorithm to compute this, because that computes ingredients for a given order, and now we need the maximum order for a given quantity of ingredients. But we can make use of the part 1 solution: because we started with 0 of everything, that number is an upper bound for the cost of 1 unit of ORE.

So what we do is use that number as an estimate of how much FUEL we can make. We let o be the (variable) ORE: o entry in our orders map, k = floor((1,000,000,000,000 - o) / f), and add FUEL: k to the orders map. Then we run the production algorithm again, which will again compute how much ORE is needed to produce that much FUEL. The difference is that this time, the orders map starts out with a surplus of some intermediate products, so we’ll need less total ORE per FUEL this time.

k = (1,000,000,000,000 - 400) / 400 = 2,499,999,999
orders:
  FUEL:       2,499,999,999
  Solid Fuel: 0
  Light Oil:  -10
  ORE:        400


// Produce the fuel:
orders:
  FUEL:       2,499,999,999 - 1 * 2,499,999,999 = 0
  Solid Fuel: 0 + 10 * 2,499,999,999            = 24,999,999,990
  Light Oil:  -10 + 10 * 2,499,999,999          = 24,999,999,980
  ORE:        400


// Produce the solid fuel:
produce(orders) =>
orders:
  FUEL:       0
  Solid Fuel: 24,999,999,990 - 1 * 24,999,999,990  = 0
  Light Oil:  24,999,999,980 + 10 * 24,999,999,990 = 274,999,999,880
  ORE:        400


// Produce the light oil:
produce(orders) =>
orders:
  FUEL:       0
  Solid Fuel: 0
  Light Oil:  274,999,999,880 - 30 * 9,166,666,663 = -10
  ORE:        400 + 100 * 9,166,666,663            = 916,666,666,700


k = (1,000,000,000,000 - 916,666,666,700) / 400 = 208,333,333

Listing 3: Orders map producing 2,499,999,999 more FUEL

We update k = floor((1,000,000,000,000 - o) / f) from the new o value in the orders map, and again add FUEL: k to the orders map and run the production algorithm. This time 1,000,000,000,000 - o will be smaller since o is greater, so k will be smaller this time. We continue this process until we reach k = 0, at which point we continue with k = 1 instead as there may still be more surplus intermediate products available. Finally, we sum up all the ks we were able to produce without having o go over 1 trillion, and add 1 for the one unit of FUEL we produced in part 1.

Day 15: Oxygen System

Day 15 is again an Intcode challenge, but this is the first where unlike the previous ones, the Intcode program acts more like a function your program can query for extra information. Runtime optimization for this problem is therefore mostly about minimizing the number of calls we need to make into the Intcode program, since that is much slower than the parent program.

Part 1 has you build a map of an unknown area, and part 2 has you count the maximum number of steps needed to get from the oxygen system to any other location on the map. Technically, part 1 only requires you to find the shortest path to the oxygen system, but we need to explore the whole map for part 2 anyway.

My solution runs in 1.3 ms and uses a little trick to reduce the amount of Intcode it needs to run: although the problem is described as navigating a single drone around an unknown labyrinth, we don’t have to maintain the immersion. When we find a dead end, instead of navigating the drone back to the unexplored part we can just “teleport” it back by reverting the Intcode machine to a previous state, as shown in figure 3. This does require extra work to make lots of copies of the Intcode machine state, but ends up saving time by significantly reducing the amount of Intcode we need to run. Moving from a depth-first search with a single Intcode machine to a breadth-first search with copied Intcode machines reduced my runtime from ~2.0 ms to ~1.5 ms.

###.#    ###.#    ###.#    ###.#    ###.#    ###.#    ###.#    ###.#
# #..    # #..    # #..    # #..    # #..    #?#..    #?#..    #.#..
#  ##    # ?##    #??##    #?###    #?###    #.###    #^###    #.###
# ?<#    #?<.#    #.^.#    #v..#    #^..#    #...#    #...#    #...#
# #.#    # #.#    #?#.#    #?#.#    #.#.#    #v#.#    #.#.#    #.#.#
# #..    # #..    # #..    # #..    #?#..    #?#..    #.#..    #v#..
#  ##    #  ##    #  ##    #  ##    #  ##    #  ##    #? ##    #? ##

Figure 3: Breadth-first search by copying the Intcode machine state to avoid backtracking. . are floors, # are walls, ? are tiles in the exploration queue, arrows indicate which tile will be explored next.

So my solution uses a pretty standard breadth-first search to explore the map. It keeps a map of the world and a queue of exploration states, each state containing the drone’s position, the current exploration direction, the number of steps taken from the start, and an instance of the Intcode machine. For each state in the queue, it attempts to move the drone one step in the exploration direction. If successful, it records a floor tile on the map and adds three new states to the queue: forward, left and right at the new position, each with its own copy of the Intcode machine. If the new position is the oxygen system, it sets the part 1 solution to the number of steps taken from the start, since the queue guarantees this is the shortest path to get there. If it instead hits a wall, it records a wall on the map and adds no new states to the queue. When the queue is empty, we have the whole map. For part 2, we simply start at the oxygen system and expand outward along the floor tiles one step at a time, recording how many steps are needed to cover the whole map.

There are also a couple of Rust-specific performance tricks here. Both when initializing the queue with the 4 cardinal directions, and when adding new states to the queue, a loop first adds all but the last new state to the queue, making a .clone() of the Intcode machine for each new state. The last new state, though, is added outside the loop so that the Intcode machine can be moved instead of cloned. This saves about 110 μs over cloning for each new state. For part 2, I use two persistent Vecs to store the current and next positions to explore, using mutable references to read and write to them, and use std::mem::swap to swap which Vec is the current vs. the next positions. This saves having to reallocate a new Vec for each step, or move values between Vecs. It saves something like 20 μs, but it’s hard to measure since the difference is quite small.