AOC 2016 - Day 5

Problem statement on Advent of Code website.

Code for this post can be found on GitHub.

Background

On Day 1, we located Easter Bunny headquarters.
On Day 2, we broke the code to use the bathroom in Easter Bunny HQ.
On Day 3, we helped the EBs' design department by removing invalid triangles from the list.
On Day 4, we figured out which rooms in the information kiosk are valid.

On Day 5, we will figure out how to get past security doors by calculating the code.

Problem - Part 1

For Part 1, we are given two pieces of information.

The door ID
An increasing integer index, starting at 0.

For each MD5 hash that starts with 00000 (five 0's), the 6th digit indicates the next character of the password. If we find eight hashes that start with five 0's, that provides the password.

Solution - Part 1

My strategy to solve this problem was two-fold.

Write an MD5 algorithm in F#.
1. I outlined my solution for a functional MD5 algorithm in F# in a previous blog post.
Write a function to collect the required hashes and construct the final answer.

The vast majority of the functionality for this solution is in the MD5 algorithm, which I won't describe here again.

Collection function

Once I had an MD5 algorithm, I used the Seq module and its library functions to collect the required hashes and calculate the door password.

The strategy for the function is simple.

Start by producing indices starting at 0.
Map all the strings to hashes.
Remove all hashes that don't match the criteria (i.e. hashes that don't start with 00000).
Once we have eight hashes, stop producing indices and get the final answer.

let day5part1input = "cxdnnyjw"

let day5part1criteria = "00000"

let day5part1 input crit (md5sum:string -> string) =
Seq.initInfinite (fun i -> input + string i)
|> Seq.map md5sum
|> Seq.filter (fun m -> m.StartsWith crit)
|> Seq.take 8
|> Seq.map (Seq.item 5 >> string)
|> Seq.reduce ( + )

When I ran the function, I got F77A0E6E, which is the correct answer.

Problem - Part 2

The problem statement for Part 2 still uses the MD5 function, but adds a twist to the password computation. We still have the same information from the problem.

The door ID
An increasing integer index, starting at 0.

We are still only interested in hashes that start with five 0's, i.e. 00000. The sixth character of the hash now indicates the position in the solution and the seventh character is what goes into that position.

If we get multiple hashes that meet the criteria and indicate the same position, all but the first match are ignored. Hashes that meet the criteria but indicate an invalid position, i.e. anything other than 0-7, are also ignored.

Solution - Part 2

To solve Part 2, I wanted to use a similar approach to Part 1. However, I needed to way to determine when the algorithm had collected enough values to form a complete answer.

My overall strategy was as follows.

Start by producing indices starting at 0.
Map all the strings to hashes.
Remove all hashes that don't match the criteria.
Once I have a character for each position of the answer, stop producing indices and get the final answer.

Storing the pieces of the answer

I chose to use a Dictionary<int, string> collection to hold the values that would form the answer. The keys represent the position in the answer and the values are the single-character strings that, when concatenated, would form the answer.

First, I created a value that would hold the relevant pieces.

let answer = Dictionary<int, string> 8

Second, I created an add function that would only add values to the Dictionary if a value for the key was not already present.

let add (d:Dictionary<_,_>) kv =
if not <| d.ContainsKey(fst kv) then
  d.Add(fst kv, snd kv)
else ()

Third, I created a notComplete function to check whether I had collected enough values for an answer.

let notComplete (d:Dictionary<_,_>) = d.Count < 8

Collecting the pieces of the answer

Once I had decided on a storage strategy, I wrote a function to add values to the Dictionary until I had enough to get an answer.

The overall strategy for this part of the logic depends on the Seq.takeWhile function. This function will terminate the infinite sequence generator once the Dictionary is "full".

Seq.initInfinite (fun i -> input + string i)
|> Seq.map md5sum
|> Seq.filter
(fun m ->
  (m.StartsWith crit)
    && (m.[5] = '0' || m.[5] = '1' || m.[5] = '2' || m.[5] = '3'
    || m.[5] = '4' || m.[5] = '5' || m.[5] = '6' || m.[5] = '7'))
|> Seq.map (fun m -> m.[5] |> string |> System.Int32.Parse, m.[6] |> string)
|> Seq.takeWhile (fun _ -> notComplete answer)
|> Seq.iter (fun m -> add answer m)

Producing the answer

As the last step, I pulled the values from the Dictionary and concatenated the strings to get the final answer.

I chose to explicitly pull the values based on key lookups because, while there are library functions to get a collection of the values, I did not see any guarantees about the order that the items would be in.

[
for i in 0 .. 7 do
  yield answer.[i]
]
|> List.reduce ( + )

When I ran this function, I got back 999828EC which is, in fact, the correct answer.

Some side notes

Mutable vs. Immutable data structures

As you may have noticed, I am using a mutable data structure to store the needed values for the Part 2 solution. At the time, I struggled to find a "pure" functional way of solving the problem but couldn't come up with something off hand (and, to tell you the truth, I didn't try too hard since I had just finished the MD5 algorithm and wanted to be done with this problem).

However, the Part 2 algorithm can easily be rewritten using a tail-recursive function and immutable data structures. The tail-recursive part is important because, while I don't know how many hashes the algorithm has to go through, I do know that even the OOTB MD5 algorithm takes over 2 minutes to calculate the answer (and my functional MD5 algorithm takes over 7 minutes). From that, I assume that the stack would get very deep if the function was not tail-recursive.

`Seq` vs. `ParallelSeq`

In order to speed up the calculation, I attempted to use the ParallelSeq library to take advantage of my CPU's multiple cores. Using the ParallelSeq library did speed up my calculations considerably.

However, one thing I learned is that the ParallelSeq library does not take care of putting the calculated values back in their original order. Instead, it is the client's responsibility to track the order of values (if that is important - and in this case it was).

As an example, when I used the ParallelSeq library, the system produced C9E29889 as the answer for Part 2. However, the correct answer for Part 2 is 999828EC, which are the same characters but in the correct order.

In the end, I removed ParallelSeq from this solution but it is a library that I intend to investigate in the future.

Lessons learned

Here are the lessons I learned from this exercise.

Take the time to implement a solution the way you want, so that you do not later have regrets about it. In this instance, I regret choosing a mutable data structure and imperative algorithm to solve Part 2 and it's something I hope to rectify in the near future.
Just as when I implemented the MD5 algorithm in F#, it is critical to be extra vigilant when reading code. Much more so than any other technique, reading and understanding code is the fastest way to find and fix most bugs.

See you next time!

PS: Solution - Part 2 (Pure functional implementation)

After I wrote this blog post, but before posting it, I decided to rewrite the Part 2 solution using immutable data structures.

Rewriting the function was quite simple, and relied on the following strategy.

Define a completion check to terminate the recursive function.
Define a tail-recursive function to calculate the 8 characters of the answer.
Combine the 8 characters to form the final answer.

Here is the code for the pure functional implementation of Part 2, broken into 3 chunks.

let day5part2alt input crit (md5sum : string -> string) =
let complete l = l |> List.filter ((=) "") |> List.isEmpty

let rec collector i acc =
  if complete acc then acc
  else
    let m = md5sum (input + string i)
    if (m.StartsWith crit) then
      match m.[5] with
      | '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' ->
        List.mapi (fun idx elt ->
          if elt = "" && idx = (m.[5] |> string |> System.Int32.Parse) then
            string m.[6]
          else elt) acc
        |> collector (i + 1)
      | _ -> collector (i + 1) acc
    else collector (i + 1) acc

[ ""; ""; ""; ""; ""; ""; ""; "" ]
|> collector 0
|> List.reduce (+)

Running this function provides the same answer as above, i.e. 999828EC.

Performance comparison with `day5part2`

I compared day5part2 and day5part2alt to see whether my new implementation matches up to the previous one.

On average, I received the following results.

*`day5part2` vs. `day5part2alt`, raw numbers.*
Function	Real	CPU	Gen0	Gen1	Gen2
`day5part2`	174.758	174.689	21623	18	2
`day5part2alt`	166.528	166.437	20590	17	1

And here are the same results, converted to percentages.

*`day5part2` vs. `day5part2alt`, percentages.*
Function	Real	CPU	Gen0	Gen1	Gen2
`day5part2`	105%	105%	105%	106%	200%
`day5part2alt`	100%	100%	100%	100%	100%

Ignoring the Gen2 result, the mutable version is actually 5% worse across the board, in both processing time and increased garbage collection. It pays to be immutable.