Last year I decided to solve Advent of Code in Rust, because it’s a very interesting language I would like to learn better. As Rust claims to be “blazingly fast and memory-efficient”, this choice of course got me thinking more about performance than I usually do when solving Advent of Code problems. Previously I’ve solved Advent of Code in Scala and Clojure, and while I did make an effort for my solutions to be fast enough to not be annoying, I was generally happy if I got them down to about a second or less per problem. But I have seen some other people showing off highly optimized solutions, so this time, armed with the power of a modern language designed for extreme performance and zero-cost abstractions, I wanted to see how low I could go.
In this series of posts, I will describe my solutions and the tricks I used to squeeze out every bit of performance I could find. I will assume that you are familiar with programming basics, but not necessarily the performance characteristics of various standard algorithms and data structures. While I used Rust and do make use of many Rust-specific features, the greatest performance gains are always from algorithmic improvements that apply equally well to almost any programming language. The solutions are all single-threaded and do not, as far as I’m aware, rely on any processor-specific features such as SIMD instructions, so they should run on most processors Rust can compile to.
These are my results:
|Day||Time, μs||Day||Time, μs|
Of course, any technical report is kind of useless without a description of methodology. In the next section I’ll describe how these results were measured. After that, we’ll get on to the solutions.
To measure performance, I used the benchmarking features
available in the Rust standard library.
In summary, the command
cargo bench will look for functions tagged with
Each benchmark function may set up some preparations,
and then starts measurement by defining a lambda function (“closure” in Rust terminology) to measure.
Cargo builds the program in release mode, calls the given function some large number of times,
and reports the average run time.
For example, a benchmark file can look like this:
b.iter(...) line is the important part:
it will measure the runtime of the code
The results are reported like this:
test intcode::day9_example_3_new ... bench: 77 ns/iter (+/- 1)
As you can see, this method can measure some very short runtimes.
This is important, as simply using the
time command or similar
to measure the process execution time for a whole executable runs into all kinds of overhead issues.
This benchmarking tool, while simple, allows for much greater granularity.
All this said, I’m not an expert on these or any other benchmarking tools - I simply used the one described in the Rust book. The results tend to fluctuate a bit between benchmark runs, but are stable enough for some simple performance comparisons. In both the previous section and the following posts, I’ll be rounding my results to 2 significant figures (and 3 for the total) to not imply too much precision. For this topic, orders of magnitude are more interesting than fine details anyway.
The benchmarks of the full solvers for each day all use the following pattern:
solve function takes a
Vec<String> as input and returns a
This means the performance measurement does not include
reading input from a text file into memory as a list of strings,
but does include all parts of parsing and processing that list of strings.
Similarly, the measurement of all days combined first reads all inputs into memory
and then benchmarks each solution from
Vec<String> to solution.
All timings were run on an AMD Ryzen 7 3700X processor.
That’s all for the introduction! The following posts will dive into details on each of the solutions.