AOC 2016 - Day 2 Alternate Solutions
Code for this post can be found on GitHub.
This post is a follow-up to my last post on my solution for AOC Day 2 and how I was unhappy with the design (though my initial results were still correct).
Since that time, I have implemented a set of five graph data structures to make the solution more robust than simply encoding the minimum solution with a function.
These data structures have a few advantages:
- The data types are much more robust - for the board and the graphs. Instead of talking in terms of raw numbers / strings, I can now speak in terms of Vertices and Edges.
- The solution's design allows for the underlying data structure to be swapped out as long as we can provide implementations of two key functions -
create
andgetNext
. - Different graph structures can be compared in terms of (basic) performance. I will provide my numbers at the end of this blog.
Modeling a board
Code on GitHub
Storing a board in a text file
First, I changed the way that boards are stored. Rather than hard-coding the board or worse, encoding the representation directly into the algorithm with no easy way to change it, I picked a very simple text-based format.
Blank spaces are represented by dots / periods (.
). All other characters represent a unique Vertex
.
For day 2, part 1, the following board is stored in a text file.
123
456
789
Similarly, here is the day 2, part 2 board:
..1..
.234.
56789
.ABC.
..D..
Boards are represented internally as a set of vertices and edges. Let's look at those in more detail.
Vertices
A Vertex
is a tuple consisting of an integer accompanied by a string label. Here are the type and module signatures for Vertex
.
type Vertex =
| V of int * string
with
override ToString : unit -> string
end
module Vertex = begin
val create : idx:int -> label:string -> Vertex
val toInt : _arg1:Vertex -> int
val toString : x:Vertex -> string
val strToV : lbl:string -> (seq<Vertex> -> Vertex)
val intToV : idx:int -> (seq<Vertex> -> Vertex)
end
Here are the expected properties of a Vertex
.
- Each
Vertex
must be represented by a unique integer. - Each
Vertex
must have a unique string label (forstrToV
to work correctly). - While not strictly required, the on-screen display for certain graph data structures is much prettier if each string label has a length of 1 (aka a single character).
All the properties listed above are not enforced by the data type. However, some properties are enforced by the Board
creation algorithm.
Edges
The AOC Day 2 problem requires directed graphs. Thus, an edge is stored as a pair of vertices and a direction.
We have four directions in the AOC problem - R
ight, L
eft, U
p, and D
own.
Direction
also has both a type and an accompanying module:
type Direction =
| R
| L
| U
| D
with
override ToString : unit -> string
end
module Direction = begin
val opposite : _arg1:Direction -> Direction
val toDirection : _arg1:string -> Direction
end
As I was working on the alternate solutions for Day 2, I added certain functions to various modules as I was writing the graph structures. However, in certain cases, I did not require those functions in the final solution but still left them in the module. A good example of this is the opposite
function in the Direction
module.
The Edge
type and module are as follows:
type Edge =
| E of Vertex * Direction * Vertex
with
override ToString : unit -> string
end
module Edge = begin
val create : dir:Direction -> v1:Vertex * v2:Vertex -> Edge
val toString : e:Edge -> string
val fromV : Edge -> Vertex
val toV : Edge -> Vertex
val direction : Edge -> Direction
end
fromV
, direction
, and toV
represent the three individual components of an Edge
.
Some properties of an Edge
:
- Edges are defined solely by the two vertices they connect and the direction of the connection. They have no other unique identifiers (internal or external).
Building the board
Now that all the building blocks are in place, we can build the Board
. The type and module for Board
are as follows.
type Board =
{vertices: seq<Vertex>;
edges: seq<Edge>;}
with
override ToString : unit -> string
end
module Board = begin
val transpose : rows:seq<#seq<'b>> -> seq<seq<'b>>
val validEdge : l1:string * l2:string -> bool
val potentialEdgesByRow :
verts:seq<Vertex> -> rows:seq<#seq<string>> -> seq<Vertex * Vertex>
val reverse : x:'a * y:'b -> 'b * 'a
val squareup : strs:seq<seq<string>> -> seq<seq<string>>
val parseBoard : strB:string -> Board
end
Most of the functions for the Board
module are internal functions and should not be invoked by an external client. The most important function in the module is parseBoard
, which takes in a string representing the board (e.g. the text files I mentioned earlier) and returns a constructed Board
that meets a number of requirements.
- Boards are two-dimensional.
- Each character in the string representation that is not a blank space is turned into a unique vertex.
- Each vertex is given a unique integer ID.
- Boards do not contain diagonal edges.
- Each pair of neighboring vertices has two edges between them.
- Given two neighboring vertices, the two edges between them have opposite directions.
Generic solution framework
The solution performs the following steps to get to the final answer for the problem.
- Create a
Board
from theboardFile
. - Use that
Board
to create a graph usingbCreateF
. - Translate the instructions from the
instrFile
intoDirection
s. - Take each line of instructions and run them through the graph. The start
Vertex
for each line of instructions is the result from the previous line of instructions, except for the first line of instructions which usesVertex
"5" as its startVertex
. - Turn the result from each line of instructions into a
string
, then concatenate them.
The solution framework has two functions, moveAll
and day2part1
. The former uses bGetNextF
to move through a line of instructions. The latter is the connector that converts a board and lines of instructions into a final answer.
The solution framework's signature is as follows.
module Solution = begin
val moveAll :
func:(Vertex -> 'a -> Vertex) ->
startv:Vertex -> instrs:seq<'a> -> Vertex
val day2part1 :
boardFile:string ->
instrFile:string ->
bCreateF:(Board -> 'a) ->
bGetNextF:('a -> Vertex -> Direction -> Vertex) -> string
end
Graph data structures
The graph data structures below are admittedly naively implemented, except inductive graphs which make use of the Hekate library. However, they all follow certain rules:
- Each data structure's type is hidden within its respective module.
- Once a graph is created, it is not modified in any way.
- If the
getNext
function cannot find a move in the given direction from the given source vertex, it will return the source vertex to the caller.
Adjacency Matrix
Code on GitHub.
Per Wikipedia, an adjacency matrix is "a square matrix used to represent a finite graph".
Since the AOC Day 2 implementation requires a directed graph, the adjacency matrix cannot be a symmetric matrix. For my implementation, the label for each row represents the "from Vertex
", the label for each column represents the "to Vertex
" and the value at their intersection represents the Direction
of movement.
For example, when looking at the Day 2, Part 1 board below, you can read the first entry as "When you start from Vertex 1 and go Right, you arrive at Vertex 2".
Here is the text representation of the Day 2, Part 1 board as an adjacency matrix.
123456789
1 .R.D.....
2 L.R.D....
3 .L...D...
4 U...R.D..
5 .U.L.R.D.
6 ..U.L...D
7 ...U...R.
8 ....U.L.R
9 .....U.L.
And here is the text representation of the Day 2, Part 2 board.
123456789ABCD
1 ..D..........
2 ..R..D.......
3 UL.R..D......
4 ..L....D.....
5 .....R.......
6 .U..L.R..D...
7 ..U..L.R..D..
8 ...U..L.R..D.
9 .......L.....
A .....U....R..
B ......U..L.RD
C .......U..L..
D ..........U..
Internally, an adjacency matrix is stored as an Array2D [,]
. The scheme used to create this data structure is somewhat fragile because it assumes that the first Vertex has a unique integer ID of 0
and that subsequent vertices' IDs are incremented deterministically without gaps. Since we control Board
and Vertex
creation, we can rely on this assumption for now but it would be a dangerous assumption to make in production code.
The getNext
function takes the following steps to move through the board:
- Use the 2D array to find the row representing the
from
Vertex. - Once found, go through the row and looks for the desired
Direction
. - If found, use that column index to go from an integer to a
Vertex
usingBoard.vertices
. - If not found, return the
from
Vertex to the caller.
let getNext am from dir =
match am with
| AM(x, b) ->
let fromInt = Vertex.toInt from
let colBase = Array2D.base2 x
x.[fromInt..fromInt,colBase..]
|> Seq.cast<Direction option>
|> Seq.tryFindIndex (( = ) (Some dir))
|> function
| Some idx -> Seq.item idx b.vertices
| None -> from
Edge List
Code on GitHub.
An edge list is an extremely simple graph implementation which does not transform the original Board
's internal structures in any way. It is, as the name implies, an exhaustive list of all edges in the graph.
Here is a visual representation of the Day 2, Part 1 board.
Edges: ["4->U->1"; "7->U->4"; "5->U->2"; "8->U->5"; "6->U->3"; "9->U->6"; "1->D->4";
"4->D->7"; "2->D->5"; "5->D->8"; "3->D->6"; "6->D->9"; "1->R->2"; "2->R->3";
"4->R->5"; "5->R->6"; "7->R->8"; "8->R->9"; "2->L->1"; "3->L->2"; "5->L->4";
"6->L->5"; "8->L->7"; "9->L->8"]
And the Day 2, Part 2 board.
Edges: ["6->U->2"; "A->U->6"; "3->U->1"; "7->U->3"; "B->U->7"; "D->U->B"; "8->U->4";
"C->U->8"; "2->D->6"; "6->D->A"; "1->D->3"; "3->D->7"; "7->D->B"; "B->D->D";
"4->D->8"; "8->D->C"; "2->R->3"; "3->R->4"; "5->R->6"; "6->R->7"; "7->R->8";
"8->R->9"; "A->R->B"; "B->R->C"; "3->L->2"; "4->L->3"; "6->L->5"; "7->L->6";
"8->L->7"; "9->L->8"; "B->L->A"; "C->L->B"]
The getNext
function for an edge list is quite simple as well.
let getNext (EL(b)) from dir =
let destv =
Seq.tryFind
(fun (E(v1,d,v2)) -> v1 = from && d = dir)
b.edges
match destv with
| Some(E(_,_,v2)) -> v2
| None -> from
Adjacency List
Code on GitHub
As opposed to an edge list, an adjacency list takes a Vertex
-centric approach to representing graphs. Many implementations of adjacency lists internally use a data structure like a hashmap or hashtable, where the key is a Vertex
and the value is a list of vertices (possibly with additional data) that the key can connect to.
For my implementation, I used two separate data structures to implement the adjacency lists - dict
and Map
. The keys are Vertex
instances and the values are lists of Edge
s.
Here is a visual representation of the Day 2, Part 1 board.
1==>1->D->4, 1->R->2,
2==>2->D->5, 2->R->3, 2->L->1,
3==>3->D->6, 3->L->2,
4==>4->U->1, 4->D->7, 4->R->5,
5==>5->U->2, 5->D->8, 5->R->6, 5->L->4,
6==>6->U->3, 6->D->9, 6->L->5,
7==>7->U->4, 7->R->8,
8==>8->U->5, 8->R->9, 8->L->7,
9==>9->U->6, 9->L->8,
Here is the Day 2, Part 2 board.
1==>1->D->3,
2==>2->D->6, 2->R->3,
3==>3->U->1, 3->D->7, 3->R->4, 3->L->2,
4==>4->D->8, 4->L->3,
5==>5->R->6,
6==>6->U->2, 6->D->A, 6->R->7, 6->L->5,
7==>7->U->3, 7->D->B, 7->R->8, 7->L->6,
8==>8->U->4, 8->D->C, 8->R->9, 8->L->7,
9==>9->L->8,
A==>A->U->6, A->R->B,
B==>B->U->7, B->D->D, B->R->C, B->L->A,
C==>C->U->8, C->L->B,
D==>D->U->B,
The getNext
implementations for both the dict
and Map
versions are very similar, only differing in how the various values are accessed.
First, here is the getNext
implementation for the dict
-version of adjacency lists.
let getNext (AL(d,_)) from dir : Vertex =
let es = d.Item from
match Seq.tryFind (fun (E(_,d,_)) -> dir = d) es with
| Some (E(_,_,v2)) -> v2
| None -> from
And, similarly, here is the Map
-version of getNext
.
let getNext (AL(d,_)) from dir =
let es = Map.find from d
match Seq.tryFind (fun (E(_,d,_)) -> dir = d) es with
| Some (E(_,_,v2)) -> v2
| None -> from
Inductive Graph
Code on GitHub.
Finally, I used an inductive graph library, Hekate, as my final representation of a graph in F#. As I noted in my last blog post, inductive graphs were introduced by Erwig in his 2001 paper and then implemented (first) as the Haskell fgl library.
Inductive graphs are functional data structures that allow similar operations to other inductive data structures such as lists and trees.
Despite using a library here instead of implementing the data structure myself, I actually enjoyed writing code with this library. The authors have done a great job of using familiar syntax and patterns with Hekate. The only downside I could find is that the Hekate library is almost completely undocumented. I learned how to use the library from the unit tests I found on GitHub and from reading Erwig's original paper.
Here is the Day 2, Part 1 board represented visually as an inductive graph.
map
[(V (0,"1"),
(map [(V (1,"2"), L); (V (3,"4"), U)], "1",
map [(V (1,"2"), R); (V (3,"4"), D)]));
(V (1,"2"),
(map [(V (0,"1"), R); (V (2,"3"), L); (V (4,"5"), U)], "2",
map [(V (0,"1"), L); (V (2,"3"), R); (V (4,"5"), D)]));
(V (2,"3"),
(map [(V (1,"2"), R); (V (5,"6"), U)], "3",
map [(V (1,"2"), L); (V (5,"6"), D)]));
(V (3,"4"),
(map [(V (0,"1"), D); (V (4,"5"), L); (V (6,"7"), U)], "4",
map [(V (0,"1"), U); (V (4,"5"), R); (V (6,"7"), D)]));
(V (4,"5"),
(map [(V (1,"2"), D); (V (3,"4"), R); (V (5,"6"), L); (V (7,"8"), U)], "5",
map [(V (1,"2"), U); (V (3,"4"), L); (V (5,"6"), R); (V (7,"8"), D)]));
(V (5,"6"),
(map [(V (2,"3"), D); (V (4,"5"), R); (V (8,"9"), U)], "6",
map [(V (2,"3"), U); (V (4,"5"), L); (V (8,"9"), D)]));
(V (6,"7"),
(map [(V (3,"4"), D); (V (7,"8"), L)], "7",
map [(V (3,"4"), U); (V (7,"8"), R)]));
(V (7,"8"),
(map [(V (4,"5"), D); (V (6,"7"), R); (V (8,"9"), L)], "8",
map [(V (4,"5"), U); (V (6,"7"), L); (V (8,"9"), R)]));
(V (8,"9"),
(map [(V (5,"6"), D); (V (7,"8"), R)], "9",
map [(V (5,"6"), U); (V (7,"8"), L)]))]
And here is the Day 2, Part 2 board as an inductive graph. Please note that F# Interactive actually truncated the print out. I am assuming it did this due to the size of the map.
map
[(V (0,"1"), (map [(V (2,"3"), U)], "1", map [(V (2,"3"), D)]));
(V (1,"2"),
(map [(V (2,"3"), L); (V (5,"6"), U)], "2",
map [(V (2,"3"), R); (V (5,"6"), D)]));
(V (2,"3"),
(map [(V (0,"1"), D); (V (1,"2"), R); (V (3,"4"), L); (V (6,"7"), U)], "3",
map [(V (0,"1"), U); (V (1,"2"), L); (V (3,"4"), R); (V (6,"7"), D)]));
(V (3,"4"),
(map [(V (2,"3"), R); (V (7,"8"), U)], "4",
map [(V (2,"3"), L); (V (7,"8"), D)]));
(V (4,"5"), (map [(V (5,"6"), L)], "5", map [(V (5,"6"), R)]));
(V (5,"6"),
(map [(V (1,"2"), D); (V (4,"5"), R); (V (6,"7"), L); (V (9,"A"), U)], "6",
map [(V (1,"2"), U); (V (4,"5"), L); (V (6,"7"), R); (V (9,"A"), D)]));
(V (6,"7"),
(map [(V (2,"3"), D); (V (5,"6"), R); (V (7,"8"), L); (V (10,"B"), U)], "7",
map [(V (2,"3"), U); (V (5,"6"), L); (V (7,"8"), R); (V (10,"B"), D)]));
(V (7,"8"),
(map [(V (3,"4"), D); (V (6,"7"), R); (V (8,"9"), L); (V (11,"C"), U)], "8",
map [(V (3,"4"), U); (V (6,"7"), L); (V (8,"9"), R); (V (11,"C"), D)]));
(V (8,"9"), (map [(V (7,"8"), R)], "9", map [(V (7,"8"), L)])); ...]
The getNext
algorithm is relatively simple, considering that each Vertex
stores information about both its predecessors and successors, i.e. what points to the Vertex
and what the Vertex
points to, respectively.
let getNext ig from dir =
match Graph.Nodes.successors from ig with
| None -> from
| Some succs ->
succs
|> List.tryFind (fun (_,d) -> d = dir)
|> function
| None -> from
| Some (dest,_) -> dest
Testing
Code on GitHub.
In the past, I have struggled quite a bit with properly implementing FsCheck tests. However, this time around, the combination of using FsCheck directly (as opposed to using it in combination with a testing library like Fuchu or Xunit) and using it from F# script files (as opposed to compiled programs) made the experience much more pleasant and was a great learning experience.
I wrote tests using FsCheck to ensure that my implementations were correct - and I'm glad I did. I found a number of elementary mistakes by having automated tests that I could easily run after each change to validate my results. For each data structure, I wrote 6 tests:
- A printout of the Day 2, Part 1 board (to be verified manually).
- A printout of the Day 2, Part 2 board (to be verified manually).
- The Day 2, Part 1 test provided in the AOC problem description.
- The Day 2, Part 2 test provided in the AOC problem description.
- The Day 2, Part 1 problem stated in the AOC problem description.
- The Day 2, Part 2 problem stated in the AOC problem description.
I could not automate the first two tests because each data structure's printout was quite different from the others.
I was able to implement the last two tests because I had already solved Day 2's problems and knew the correct answers.
Performance
Raw data on GitHub - requires a program that can read XLSX files.
After writing the five graph data structure implementations, I decided to run some basic performance tests on them. Having a generalized solution framework allowed me to attribute all differences in runtimes to the specific data structure being used.
To perform the tests in the most "unbiased" way possible, I ran each test 5 times and took the average of the results. For each run, I executed the last 4 tests listed in the Testing section.
Without further ado, here are the results from using F# interactive's #time
directive.
Algorithm | Real average | CPU average | Gen0 average | Gen1 average | Gen2 average |
---|---|---|---|---|---|
Adjacency List Dict | 0.8452 | 0.8326 | 104.2 | 25.2 | 0 |
Adjacency List Map | 0.8684 | 0.8576 | 105.8 | 21.6 | 0 |
Inductive Graph | 0.9894 | 0.9852 | 136.2 | 33 | 0 |
Adjacency Matrix | 4.4932 | 4.4798 | 530.6 | 78.8 | 0.6 |
Edge List | 359.2024 | 359.1448 | 68896.6 | 61.4 | 3.6 |
I then took the Adjacency List Dict implementation as the baseline (since it was the fastest) and translated the same table into percentages.
Algorithm | Real | CPU | Gen0 | Gen1 | Gen2 |
---|---|---|---|---|---|
Adjacency List Dict | 100% | 100% | 100% | 100% | - |
Adjacency List Map | 103% | 103% | 102% | 86% | - |
Inductive Graph | 117% | 118% | 131% | 131% | - |
Adjacency Matrix | 532% | 538% | 509% | 313% | 100% |
Edge List | 42499% | 43135% | 66120% | 244% | 600% |
Observations
I was not surprised to find that the Edge List was the worst performer. However, I was surprised by how much worse its performance was, even on a small board, than the next worst algorithm, the Adjacency Matrix.
Along those same lines, I was surprised by how poor the performance of the Adjacency Matrix was. However, I have a feeling that a more optimized implementation could do much better than my naive attempt. I believe my observation (as to my poor Adjacency Matrix writing skills) is accurate considering that it, along with the Adjacency List, is one of the most popular data structures for storing graphs.
The inductive graph performed very well for a relatively new data structure that was intended, from the beginning, to be a new way of storing graphs in functional languages. Its run-time performance was approximately 17% worse than the baseline and memory performance (based on garbage collection) was approximately 31% worse than the baseline. For someone who needs a graph data structure for more than just a getNext
function, it could be a serious contender and warrants additional testing.
Next steps
I am satisfied with the results of this investigation into graph data structures in F#. I was able to implement all algorithms in a functional manner within a generic framework. I believe I have accomplished the goal that I set out with, which was to find a more robust and 'realistic' way of representing graph-like structures in F#.
However, here are some items that I could not (or did not) complete and may be worth looking into:
- Change the base data structures to enforce more of the expected properties / invariants.
- Make the types for
Vertex
,Edge
,Direction
, etc. private and move them within their respective modules. - Test with larger / more complex boards to better understand performance.
- Change the way boards are parsed from text files so that when the same character appears multiple times on a board, it is treated as the same Vertex (allowing for more complex board structures).
- Change boards to allow 3D representations.
- Remove copying from Adjacency Matrix's
create
function. - Implement (or find existing libraries) and test more functions of graph data structures.
I will now be returning to blogging about my solutions for Advent of Code 2016 problems. I am greatly looking forward to this because I have temporarily stopped solving more problems until I catch up with my blog posts. Hopefully, I will be able to post my Day 3 solutions in the next week.
See you next time!