Last year I decided to solve Advent of Code in Rust, because it’s a very interesting language I would like to learn better. As Rust claims to be “blazingly fast and memory-efficient”, this choice of course got me thinking more about performance than I usually do when solving Advent of Code problems. Previously I’ve solved Advent of Code in Scala and Clojure, and while I did make an effort for my solutions to be fast enough to not be annoying, I was generally happy if I got them down to about a second or less per problem. But I have seen some other people showing off highly optimized solutions, so this time, armed with the power of a modern language designed for extreme performance and zero-cost abstractions, I wanted to see how low I could go.

In this series of posts, I will describe my solutions and the tricks I used to squeeze out every bit of performance I could find. I will assume that you are familiar with programming basics, but not necessarily the performance characteristics of various standard algorithms and data structures. While I used Rust and do make use of many Rust-specific features, the greatest performance gains are always from algorithmic improvements that apply equally well to almost any programming language. The solutions are all single-threaded and do not, as far as I’m aware, rely on any processor-specific features such as SIMD instructions, so they should run on most processors Rust can compile to.

So let’s get started! The code is available on GitHub. The solutions should work with any puzzle input - if they do not, please let me know!

These are my results:

Day Time, μs Day Time, μs
1 2.0 14 470
2 5,400 15 1,300
3 20,000 16 8,600
4 120 17 1,900
5 15 18 11,000
6 260 19 1,300
7 1,100 20 11,000
8 340 21 8,800
9 4,900 22 450
10 13,000 23 2,800
11 2,200 24 2,600
12 18,000 25 5,700
13 8,000 All 130,000

Of course, any technical report is kind of useless without a description of methodology. In the next section I’ll describe how these results were measured. After that, we’ll get on to the solutions.

Methodology

To measure performance, I used the benchmarking features available in the Rust standard library. In summary, the command cargo bench will look for functions tagged with #[bench] in the benches/ directory. Each benchmark function may set up some preparations, and then starts measurement by defining a lambda function (“closure” in Rust terminology) to measure. Cargo builds the program in release mode, calls the given function some large number of times, and reports the average run time. For example, a benchmark file can look like this:

#![feature(test)]
extern crate test;
use adventofcode_2019::intcode::IntcodeComputer;
use test::Bencher;

#[bench]
fn day9_example_3_new(b: &mut Bencher) {
    let program = vec![104, 1125899906842624, 99];
    assert_eq!(
        IntcodeComputer::new(program.clone()).run(None)[0],
        program[1]
    );
    b.iter(|| IntcodeComputer::new(program.clone()).run(None));
}

The b.iter(...) line is the important part: it will measure the runtime of the code IntcodeComputer::new(program.clone()).run(None). The results are reported like this:

test intcode::day9_example_3_new   ... bench:          77 ns/iter (+/- 1)

As you can see, this method can measure some very short runtimes. This is important, as simply using the time command or similar to measure the process execution time for a whole executable runs into all kinds of overhead issues. This benchmarking tool, while simple, allows for much greater granularity.

All this said, I’m not an expert on these or any other benchmarking tools - I simply used the one described in the Rust book. The results tend to fluctuate a bit between benchmark runs, but are stable enough for some simple performance comparisons. In both the previous section and the following posts, I’ll be rounding my results to 2 significant figures (and 3 for the total) to not imply too much precision. For this topic, orders of magnitude are more interesting than fine details anyway.

The benchmarks of the full solvers for each day all use the following pattern:

fn run_bench(day: u8, b: &mut Bencher) {
    let input_lines: Vec<String> = read_input_file(day);
    let solve = days::get_solver(day).unwrap();
    b.iter(|| solve(&input_lines));
}

Each solve function takes a Vec<String> as input and returns a (String, String). This means the performance measurement does not include reading input from a text file into memory as a list of strings, but does include all parts of parsing and processing that list of strings. Similarly, the measurement of all days combined first reads all inputs into memory and then benchmarks each solution from Vec<String> to solution. All timings were run on an AMD Ryzen 7 3700X processor.

That’s all for the introduction! The following posts will dive into details on each of the solutions.