This is part of a series of posts, starting with Advent of Code 2019 in 130 ms: introduction.

Day 1: The Tyranny of the Rocket Equation

The first challenge was, as usual, the simplest, and has the shortest runtime by far: 2 μs. There’s not much to say about my solution - the performance is all thanks to Rust’s very efficient standard library. Since it’s already so much faster than all the other solutions, I didn’t really bother optimizing it further.

Day 2: 1202 Program Alarm

The second challenge kicks us off with the Intcode machine language, which will be a recurring theme throughout the whole season. My solution runs in 5.4 ms, and is again pretty straightforward - most of the problem lies in implementing the Intcode machine, which I extracted to a separate module.

Assuming you treat the input program as an opaque black box, getting this solution to run fast is mostly a matter of optimizing the Intcode engine. I’ll get back to that later, as the language will expand over the next few challenges. It may be possible to reverse-engineer the input program and implement that as more efficient code, but throughout the season I chose to make as few assumptions as possible about the behaviour of the given Intcode programs. Consequently, I also did not try to “compile” them or optimize them on the fly, since any of them could in theory be self-modifying programs.

The one thing I did do to optimize this solution was to actually read the problem description, which states in part two that “Each of the two input values will be between 0 and 99, inclusive.”. I had apparently missed this instruction and set my iteration limits to the length of the program, which is 133 for my puzzle input. This brought runtime down from ~7 ms to the ~5 ms.

Day 3: Crossed Wires

Update 2020-08-27: Reddit user askalski suggested a better solution, which reduces runtime for this solution to 28 μs.

Day 3 is the first challenge where choice of algorithm is important. My solution runs in 20 ms, which is actually the slowest of my solutions as I haven’t come up with a better algorithm yet. The algorithm works like this:

  1. Parse each of the wires into a vector of points.
  2. Copy each list into a hash set.
  3. Compute the intersection of the two sets.
  4. For part 1:
    1. Of the points in the intersection, find the one with the lowest Manhattan norm.
  5. For part 2:
    1. Convert each vector of points into a hash map from points to the index of that point in the list.
    2. Of the points in the intersection, find the point with the lowest sum of its indices from the two maps.

Although computing the intersection of two sets would naïvely be an O(N2) operation, this is reduced to O(N) thanks to the use of hash sets. Checking for existence in a hash set is an O(1) operation (see figure 1), as is lookup in a hash map, so constructing both hash maps, both sets and computing their intersection are all O(N) operations as they are all a matter of iterating through a collection of N items. The set of intersections will be much smaller than either vector of points, so once we have that we can very quickly compute the other quantities we need.

Figure 1: Computing set intersection directly from vectors vs. using hash sets

My initial solution started out at ~27 ms, and I was able to optimize a bit by optimizing the point representation. I first used a tuple of integers, (i64, i64), which I replaced with a custom tuple struct Point(i64) which uses bitwise logic to store both coordinates in a single 64-bit integer. This bumped the runtime down to ~24 ms, so I speculate that Rust’s default Hash implementation for i64 is faster than that for (i64, i64) - presumably about twice as fast - so the hash sets and hash maps operate a bit faster. I then took a closer look at the minimum and maximum coordinate values in my puzzle input, which were all less than 10,000, so I decided it acceptable to further compress the representation to Point(i32). This got me down to the 20 ms runtime, again presumably thanks to faster hashing.

Day 4: Secure Container

Challenge 4 is about counting the number of numbers in a range that satisfy a list of criteria. My initial solution did a simple brute force scan checking each number in the whole range, and took ~110 ms to run.

After submitting my solution I did a first round of cleanup. This made two performance improvements, reducing run time to ~31 ms:

  • Instead of always computing results for all conditions, return false immediately if any condition does not pass.
  • Instead of two separate test functions checking the conditions for part 1 and part 2 separately, check the part 1 conditions first. Check the conditions for part 2 only if the part 1 conditions pass.

A much greater performance gain, though, came from replacing the brute force loop with a smarter iterator that exploits one of the given rules:

Going from left to right, the digits never decrease; they only ever increase or stay the same (like 111123 or 135679).

Instead of a plain integer value, the iterator stores its current state as a vector of digits with the least significant digit first. Figure 2 shows how it works: When initialized, it works through the digits from most to least significant and sets each digit equal to the previous if the previous is greater. To generate the next number, a 0 is first added to the end if all digits are 9s. Next, the first non-9 digit is incremented, and then all preceding 9s are set to the same incremented value.

Figure 2: Generating numbers with no descending digits

This means we won’t even visit any numbers that don’t satisfy the above condition. This reduced the number of numbers I needed to check from 540,880 to 1,073, and reduced runtime to ~290 μs.

Finally, I was still converting each number to a string to check the conditions about adjacent digits. The new number representation made this additional conversion unnecessary, and removing it further reduced runtime to 120 μs - yes, three orders of magnitude faster than the initial solution.

Day 5: Sunny with a Chance of Asteroids

Day 5 adds a few more features to Intcode, and is just a matter of running a program on the upgraded engine. My solution runs in 15 μs, which didn’t change much from my initial solution.